Estimation
Falacy 6 To estimate cost and schedule, first estimate lines of code.
Discussion
Estimation, we mentioned in several of the facts earlier in this book, is a vitally important activity in software. But, as we also saw in those facts, we struggle mightily to find ways to do it well.
Somehow, over the years, we have evolvedas the most popular way of performing estimationthe notion of first estimating the size of the product to be built in lines of code (LOC). From that, according to this idea, we can then do a conversion of LOC to cost and schedule (based, presumably, on historical data relating LOC to the cost and schedule needed to build those LOC). The idea behind the idea is that we can estimate LOC by looking at similar products we have previously built and extrapolating that known LOC data to fit the problem at hand.
So why is this method, acknowledged to be the most popular in the field, fallacious? Because there is no particular reason why the estimation of LOC is any easier or more reliable than the estimation of cost and schedule. Because it is not obvious that there is a universal conversion technique for LOC to cost and schedule (we already skewered the one-size-fits-all notion in the previous fact). Because one program's LOC may be very different from another program's LOC: Is one line of COBOL code the same degree of complexity as one line of C++ code? Is one line of a deeply mathematical scientific application comparable to one line of a business system? Is one line of a junior programmer's code equivalent to one line from your best programmer? (See Fact 2 about those individual differencesup to 28 to 1for an answer to that question.) Is one LOC in a heavily commented program comparable to a LOC in one with no comments? What, in fact, constitutes a LOC?
Controversy
Let the controversy begin!
I already came down hard on this fallacy in Fact 8, where I said "this idea would be laughablein the sense that it is probably harder to know how many LOC a system will contain than what its schedule and cost will beif it were not for the fact that so many otherwise bright computer scientists advocate it."
You think that was harsh? You haven't begun to experience the ferocious opposition that exists to this fallacy. Capers Jones, in most of his writings, goes absolutely ballistic about LOC approaches. In identifying the biggest risks in the software field, he places inaccurate metrics at number one and loses no time in saying that LOC metrics are the reason he chose this number one. "It was proven in 1978 that 'lines of code' . . . cannot be safely used to aggregate productivity and quality data" (Jones 1994). He goes on to list "six serious problems with LOC metrics," and later, in case you didn't connect "inaccurate metrics" specifically to LOC, he says, "The usage of LOC metrics ranks as the most serious problem."
In case that number one risk didn't sufficiently deter you from believing in LOC approaches, Jones (1994) goes on to list these additional "top 10" risks that are related in some way to the use of LOC (Jones's rankings are shown in parentheses):
Inadequate measurement (2)
Management malpractice (4)
Inaccurate cost estimating (5)
It would be possible to list here some others who stir the controversy of the use of LOC in estimation. But all of them would pale to insignificance next to the vitriol of the Jones opposition!
Source
Jones (1994), a wonderful and unique book in spite of (not because of) Jones's strident opposition to LOC, is listed in the following Reference section.
Reference
Jones, Capers. 1994. Assessment and Control of Software Risks. Englewood Cliffs, NJ: Yourdon Press.