1. Introduction

In this tutorial, we’ll study one of the most popular metrics for software size estimation: source lines of code (SLOC) or simply lines of code (LOC).

2. Software Size Estimation

Software size estimation is the process through which we estimate the size of the software to be developed. We use it to determine software costs. Hence, tight size estimates are critical for correct cost estimates. Our project can run behind schedule or exceed its budget if they are way off.

So, most companies use several estimation methods simultaneously. One of the most popular ones is LOC.

3. Lines of Code

LOC was developed for line-oriented procedural languages, such as Fortran and Assembly.

We calculate LOC by counting source code lines. In doing so, we skip blank lines, comments, annotations, or other hints about the source code.

As the name says, the unit is a line of code. Its symbol is S_{s}. If our program is very long, we may use the unit KLOC (denoted with S) for a thousand lines.

3.1. Types

There are two subcategories of LOC:

  1. Physical LOC
  2. Logical LOC

Physical LOC counts the number of actual lines of code separated by an end marker. For example, statements between semicolons constitute a line in C programs. In contrast, logical LOC examines a single physical line of code and counts standalone statements in it. For instance, x=int(input(“Enter your age: “)) is one physical line but two logical since it has two statements:

  1. taking user input via input
  2. typecasting via int

3.2. Example

Let’s consider the following Python code snippet for checking if an integer is odd or even:

def check_odd(num):
    """ Check odd/even by calculating the remainder of num // 2 using % """
    res = num % 2
    if res == 0:
        return False
    else:
        return True
    

if __name__ == "__main__":
    str = input("Enter an integer number: ")
    num = int(str)
    
    status = check_odd(num)
    if status:
        print(f"{num} is odd")
    else:
        print(f"{num} is even")

In this code, physical and logical LOCs are 14.

Now, let’s consider another version of the same code. In it, we combined the first two lines of __main__, as well as the third and fourth lines. We also made check_odd more concise by removing res and replacing it with num % 2 in the condition of the if statement:

def check_odd(num):
    """ Check odd/even by calculating the remainder of num // 2 using % """
    if num % 2 == 0:
        return False
    else:
        return True
    

if __name__ == "__main__":
    num = int(input("Enter an integer number: "))
    if check_odd(num):
        print(f"{num} is odd")
    else:
        print(f"{num} is even")

In this code, the physical LOC is 11, but the logical LOC remains 14. This is because the collapsed lines in __main__ consist of two logically executable statements. The first entails taking input and casting a string to an integer, while the second one stores the result of check_odd and compares it to True. Similar goes for if num % 2 == 0 in check_odd.

3.3. Benefits

First, it’s the most straightforward and used metric. It’s been there since Fortran, so we have much more LOC data than for other metrics. Therefore, we can easily use historic LOC data to get tighter estimates (e.g., via machine learning).

Secondly, it’s highly intuitive and easy to visualize and process. Even a beginner-level programmer can understand and calculate it.

Thirdly, we can automate its calculation. There are small utility programs to measure LOC for each programming language.

3.4. Drawbacks

The biggest problem with LOC is that we can use it to estimate projects that plan to use one programming language with a fixed syntax and agreed-upon coding standard. This is so because LOC counts lines of the code as per the programming language syntax and semantics.

The second problem with LOC is that it skips documentation lines. For example, we often make small changes in the production environment but add extensive comments or annotations to document them. However, comments and other hints aren’t covered by LOC.

Another problem is that LOC doesn’t consider the complexity of the underlying code statements. So, it doesn’t correctly match the quality and efficiency of the code since not all lines are equally important, complex, or easy to write. Sometimes, a few lines of code involving complex logic can be harder to come by than very large but straightforward programs.

Moreover, LOC can’t easily accommodate non-procedural languages such as C++, Java, and SQL.

4. Conclusion

In this article, we described the LOC metric for software size estimation. It counts lines of code, either physical or logical, and is easy to understand and compute. Therefore, it’s best to use it when we intend to code in a single language and base our software on standard design patterns.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.