Cargo Cult Programming
- Preincrement Is More Efficient!
- Goto Considered Faster?
- String Theory
- Cargo Cult Languages
- Don't Try This at Home
Using a human brain to simulating thinking is quite hard. It's evolved over a very long period to recognize patterns, and is so good at this that it often recognizes them, even when they are not present. Humans aren't the only animals to experience this. If you feed pigeons at random intervals, then after a while they start associating whatever they were doing with the appearance of food. If a pigeon is always walking in a counterclockwise circle when you feed it, then it will start walking in a counterclockwise circle the next time it is hungry, in an attempt to make food appear.
There are lots of cases of humans doing this, but the most notable example is the cargo cults of the Melenesian islands, which produced elaborate models of airplanes in an attempt to make the planes that had previously dropped supplies during the Second World War.
Programming, as a discipline based on logic and modern technology, is perhaps the last place you would expect to find this kind of thinking, but unfortunately, it creeps into a lot of code. In this article, I'm going to look at a number of common patterns that I've seen where it isn't. Cases where something made sense once, and its form has been copied without its substance.
Preincrement Is More Efficient!
I see a lot of code, typically written by C++ programmers, that goes out of its way to avoid the postincrement operator. In C, there are two increment operators: preincrement and postincrement, written ++i and i++, respectively. The preincrement operator increments i and the expression evaluates to the result. Postincrement does the same, but the expression evaluates to the old version of i.
People who remember this vague bit of wisdom end up writing abominations like this:
++item->use_count;
I've shown this line to a few people, and about half of them were able to correctly identify what it did. This was slightly easier for me, because there was a comment on the previous line:
// Increment the use count for this item
It's a pretty good indication that you're writing terrible code if even you realize that you need to write a comment describing what a single statement does, rather than describing the high-level details of an algorithm, or the reasoning behind a particular approach. For those who aren't familiar with C operator precedence, the field access operator (->) has a higher precedence than the preincrement operator, so this is equivalent to:
++(item->use_count);
Or, to the much more readable:
item->use_count++;
Why is the latter more readable? Because the ++ is written next to the thing that it's incrementing: the structure field. When used as a statement like this, the two will generate exactly the same code in any vaguely competent compiler.
So why did this developer prefer the less readable version, and why would they believe that it was faster? It turns out that the answer is C++.
C++ introduces the idea of operator overloading. This is a nice idea, in theory, but the implementation in C++ has so many traps and gotchas that the only sane approach is to pretend that it doesn't exist. In C, the increment operators are only defined for integer and floating-point values, and translate directly to a single instruction. In C++, they may be defined for any structure.
When they are defined on a structure, the difference in the semantics of the two is a lot more pronounced. The preincrement operator returns a reference, while the postincrement version returns a value. This means that the preincrement version can modify the object that it's called on, while the postincrement version must always make a copy.
In the C case, if you don't use the result of a preincrement operation, the compiler will just optimize it away. For a C++ compiler to do the same transformation on an arbitrary postfix increment, it needs to first inline the overloaded method, which means that it can't be virtual and that it must be defined in the same compilation unit as the caller. It must also be able to inline the copy constructor, and then show that it has no side effects. This is a very complex optimization for something that seems obvious to a human.
So, in C++, when using structures, the prefix version is indeed faster. But that's no excuse for avoiding the suffix version in C code, or even in C++ code using primitive types.