- A Load of Shift
- Global Arguments
- Recursion Is Slow!
- But It Was Faster!
- You Can't See the Tree for the Branches
- The VM Does What?
You Can't See the Tree for the Branches
One of my favorite mis-optimizations is to test for special cases at the start of a function and use them to shortcut special cases. It's one of my favorites because I've been guilty of doing it, and the speed boost from removing it was most satisfying.
In my case, it was in a sparse-array lookup function, where I was using NULL to indicate an empty subtree. At each point, I tested whether a pointer was NULL, and returned NULL if it was. It turned out that this was a very uncommon case. Most of the time, when you looked for something it was there. (If it wasn't, you wanted to do some recovery, and therefore you were in a slow code-path anyway.) The cost of the extra branch in that bit of code was imposing a penalty on every case, and marginally speeding up an unusual case.
The solution in my example was to have a self-referential tree node where every child pointed back to itself, and I would use that instead of NULL as the "not present" value. This approach meant that every lookup returned something, but sometimes it would navigate down a statically defined tree to get to NULL. The interface didn't change, but the code ran 2050% faster, depending on the machine. The branch overhead was a major part of the total time spent.
This issue is part of a more general problem. Optimizing for special cases can add some overhead to every use of a function, providing a speed improvement only in special cases. Unless a special case is particularly common, or orders of magnitude slower than every other case, then it's typically not worth the bother.