Solution
Clarity: A Short Sermon
-
Who benefits from clear, understandable code?
In short, just about everyone benefits.
First, clear code is easier to follow while debugging and, for that matter, is less likely to have as many bugs in the first place, so writing clean code makes your own life easier even in the very short term. (For a case in point, see the discussion surrounding Example 27-2 in Item 27.) Further, when you return to the code a month or ayear lateras you surely will if the code is any good and is actually being usedit's much easier to pick it up again and understand what's going on. Most programmers find keeping full details of code in their heads difficult for even a few weeks, especially after having moved on to other work; after a few months or even a few years, it's too easy to go back to your own code and imagine it was written by a strangeralbeit a stranger who curiously happened to follow your personal coding style.
But enough about selfishness. Let's turn to altruism: Those who have to maintain your code also benefit from clarity and readability. After all, to maintain code well one must first grok the code. "To grok," as coined by Robert Heinlein, means to comprehend deeply and fully; in this case, that includes understanding the internal workings of the code itself, as well as its side effects and interactions with other sub-systems. It is altogether too easy to introduce new errors when changing code one does not fully understand. Code that is clear and understandable is easier to grok, and therefore, fixes to such code become less fragile, less risky, less likely to have un-intended side effects.
Most important, however, your end users benefit from clear and understandable code for all these reasons: Such code is likely to have had fewer initial bugs in the first place, and it's likely to have been maintained more correctly without as many new bugs being introduced.
Guideline
By default, prefer to write for clarity and correctness first.
Dissecting Index Tables
-
The following code presents an interesting and genuinely useful idiom for creating index tables into existing containers. For a more detailed explanation, see the original article [Hicks00].
Critique this code and identify:
-
Mechanical errors, such as invalid syntax or nonportable conventions.
-
Stylistic improvements that would improve code clarity, reusability, and maintainability.
-
Again, let me repeat that which bears repeating: This code presents an interesting and genuinely useful idiom. I've frequently found it necessary to access the same container in different ways, such as using different sort orders. For this reason it can be useful indeed to have one principal container that holds the data (for example, a vector<Employee>) and secondary containers of iterators into the main container that support variant access methods (for example, a set<vector<Employee>::iterator, Funct> where Funct is a functor that compares Employee objects indirectly, yielding a different ordering than the order in which the objects are physically stored in the vector).
Having said that, style matters too. The original author has kindly allowed me to use his code as a case in point, and I'm not trying to pick on him here; I'm just adopting the technique, pioneered long ago by such luminaries as P. J. Plauger, of expounding coding style guidelines via the dissection and critique of published code. I've critiqued other published material before and have had other people critique my own, and I'm positive that further dissections will no doubt follow.
Having said all that, let's see what we might be able to improve in this particular piece of code.
Correcting Mechanical Errors
-
Mechanical errors, such as invalid syntax or nonportable conventions.
The first area for constructive criticism is mechanical errors in the code, which on most platforms won't compile as shown.
#include <algorith>
-
Spell standard headers correctly. Here the header <algorithm> is misspelled as <algorith>. My first guess was that this is probably an artifact of an 8-character file system used to test the original code, but even my old version of VC++ on an old version of Windows (based on the 8.3 filename system) rejected this code. Anyway, it's not standard, and even on hobbled file systems the compiler itself is required to support any standard long header names, even if it silently maps it onto a shorter filename (or onto no file at all).
Next, consider:
main()
-
Define main correctly. This unadorned signature for main has never been standard C++ [C++98], although it is a conforming extension as long as the compiler warns about it. It used to be valid in pre-1999 C, which had an implicit int rule, but it's nonstandard in both C++ (which never had implicit int) and C99 [C99] (which as far as I can tell didn't merely deprecate implicit int, but actually removed it outright). In the C++ standard, see:
-
§3.6.1/2: portable code must define main as either int main() or int main(int,char*[])
-
§7/7 footnote 78, and §7.1.5/2 footnote 80: implicit int banned
-
Annex C (Compatibility), comment on 7.1.5/4: explicitly notes that bare main() is invalid C++, and must be written int main()
Guideline
Don't rely on implicit int; it's not standard-conforming portable C++. In particular "void main()" or just "main()" has never been standard C++, although many compilers still support them as conforming extensions.
cout << "#################" << endl;
-
-
Always #include the headers for the types whose definitions you need. The program uses cout and endl but fails to #include <iostream>. Why did this probably work on the original developer's system? Because C++ standard headers can #include each other, but unlike C, C++ does not specify which standard headers #include which other standard headers.
In this case, the program does #include <vector> and <algorithm>, and on the original system it probably just so happened that one of those headers also happened to indirectly #include <iostream> too. That might work on the library implementation used to develop the original code, and it happens to work on mine too, but it's not portable and not good style.
-
Follow the guidelines in Item 36 in More Exceptional C++ [Sutter02] about using namespaces. Speaking of cout and endl, the program must also qualify them with std:: or write using std::cout; using std::endl;. Unfortunately it's still common for authors to forget namespace scope qualifiersI hasten to point out that this author did correctly scope vector and sort, which is good.
Improving Style
-
Stylistic improvements that would improve code clarity, reusability, and maintainability.
Beyond the mechanical errors, there were several things I personally would have done differently in the code example. First, a couple of comments about the helper struct:
template <class RAIter> struct sort_idxtbl_pair { RAIter it; int i; bool operator<(const sort_idxtbl_pair& s) { return (*it) < (*(s.it)); } void set(const RAIter& _it, int _i) { it=_it; i=_i; } sort_idxtbl_pair() {} };
-
Be const correct. In particular, sort_idxtbl_pair::operator< doesn't modify *this, so it ought to be declared as a const member function.
Guideline
Practice const correctness.
-
Remove redundant code. The program explicitly writes class sort_idxtbl_pair's default constructor even though it's no different from the implicitly generated version. There doesn't seem to be much point to this. Also, as long as sort_idxbl_pair is a struct with public data, having a distinct set operation adds a little syntactic sugar but because it's called in only one place the minor extra complexity doesn't gain much.
Guideline
Avoid code duplication and redundancy.
Next, we come to the core function, sort_idxtbl:
template <class RAIter> void sort_idxtbl(RAIter first, RAIter last, int* pidxtbl) { int iDst = last-first; typedef std::vector< sort_idxtbl_pair<RAIter> > V; V v(iDst); int i=0; RAIter it = first; V::iterator vit = v.begin(); for(i=0; it<last; it++, vit++, i++) (*vit).set(it,i); std::sort(v.begin(), v.end()); int *pi = pidxtbl; vit = v.begin(); for(; vit<v.end(); pi++, vit++) *pi = (*vit).i; }
-
Choose meaningful and appropriate names. In this case, sort_idxtbl is a misleading name because the function doesn't sort an index table. . it creates one! On the other hand, the code gets good marks for using the template parameter name RAIter to indicate a random-access iterator; that's what's required in this version of the code, so naming the parameter to indicate this is a good reminder.
Guideline
Choose clear and meaningful names.
-
Be consistent. In sort_idxtbl, sometimes variables are initialized (or set) in for loop initialization statements, and sometimes they aren't. This just makes things harder to read, at least for me. Your mileage may vary on this one.
-
Remove gratuitous complexity. This function adores gratuitous local variables! It contains three examples. First, the variable iDst is initialized to last-first and then used only once; why not just write last-first where it's used and get rid of clutter? Second, the vector iterator vit is created where a subscript was already available and could have been used just as well, and the code would have been clearer. Third, the local variable it gets initialized to the value of a function parameter, after which the function parameter is never used; my personal preference in that case is just to use the function parameter (even if you change its valuethat's okay!) instead of introducing another name.
-
Reuse Part 1: Reuse more of the standard library. Now, the original program gets good marks for reusing std::sortthat's good. But why handcraft that final loop to perform a copy when std::copy does the same thing? Why reinvent a special-purpose sort_idxtbl_pair class when the only thing it has that std::pair doesn't is a comparison function? Besides being easier, reuse also makes our own code more readable. Humble thyself and reuse!
Guideline
Know about and (re)use the standard library's facilities wherever appropriate instead of hand-rolling your own.
-
Reuse Part 2: Kill two birds with one stone by making the implementation itself more re-usable. Of the original implementation, nothing is directly reusable other than the function itself. The helper sort_idxtbl_pair class is hardwired for its purpose and is not independently reusable.
-
Reuse Part 3: Improve the signature. The original signature
template <class RAIter> void sort_idxtbl(RAIter first, RAIter last, int* pidxtbl)
takes a bald int* pointer to the output area, which I generally try to avoid in favor of managed storage (say, a vector). But at the end of the day the user ought to be able to call sort_idxtbl and put the output into a plain array or a vector or anything else. Well, the requirement "be able to put the output into any container" simply cries out for an iterator, doesn't it? (See also Items 5 and 6.)
template< class RAIn, class Out > void sort_idxtbl(RAIn first, RAIn last, Out result)
Guideline
Widen the reusability of your generic components by not hardcoding types that don't need to be hardcoded.
-
Reuse Part 4, or Prefer comparing iterators using !=: When comparing iterators, always use != (which works for all kinds of iterators) instead of < (which works only for random-access iterators), unless of course you really need to use < and only intend to support random-access iterators. The original program uses < to compare the iterators it's given to work on, which is fine for random-access iterators, which was the program's initial intent: to create indexes into vectors and arrays, both of which support random-access iteration. But there's no reason we might not want to do exactly the same thing for other kinds of containers, like lists and sets, that don't support random-access iteration, and the only reason the original code won't work for such containers is that it uses < instead of != to compare iterators.
As Scott Meyers puts it eloquently in Item 32 of [Meyers96], "Program in the future tense." He elaborates:
Good software adapts well to change. It accommodates new features, it ports to new platforms, it adjusts to new demands, it handles new inputs. Software this flexible, this robust, and this reliable does not come about by accident. It is designed and implemented by programmers who conform to the constraints of today while keeping in mind the probable needs of tomorrow. This kind of softwaresoftware that accepts change gracefullyis written by people who program in the future tense.
Guideline
Prefer to compare iterators using !=, not <.
-
Prefer preincrement unless you really need the old value. Here, for the iterators, writing preincrement (++i) should habitually be preferred over writing postincrement (i++); see [Sutter00, Item 39]. True, that might not make a material difference in the original code because vector<T>::iterator might be a cheap-to-copy T* (although it might not be, particularly on checked STL implementations), but if we implement point 13 we may no longer be limited to just vector<T>::iterators, and most other iterators are of class typeperhaps often still not too expensive to copy, but why introduce this possible inefficiency needlessly?
Guideline
Prefer to write preincrement rather than postincrement, unless you really need to use the previous value.
That covers most of the important issues. There are other things we could critique but that I didn't consider important enough to call attention to here; for example, production code should have comments that document each class's and function's purpose and semantics, but that doesn't apply to code accompanying magazine articles where the explanation is typically written in better English and in greater detail than code comments have any right to expect.
I'm deliberately not critiquing the mainline for style (as opposed to the mechanical errors already noted that would cause the mainline to fail to compile), because, after all, this particular mainline is only meant to be a demonstration harness to help readers of the magazine article see how the index table apparatus is meant to work, and it's the index table apparatus that's the intended focus.
Summary
Let's preserve the original code's basic interface choice instead of straying far afield and proposing alternate design choices. [40] Limiting our critique just to correcting the code for mechanical errors and basic style, then, consider the three alternative improved versions below. Each has its own benefits, drawbacks, and style preferences as explained in the accompanying comments. What all three versions have in common is that they are clearer, more understandable, and more portable codeand that ought to count for something, in your company and in mine.
// An improved version of the code originally published in [Hicks00]. // #include <vector> #include <map> #include <algorithm> // Solution 1 does some basic cleanup but still preserves the general structure // of the original's approach. We're down to 17 lines (even if you count "public:" // and "private:" as lines), where the original had 23. // namespace Solution1 { template<class Iter> class sort_idxtbl_pair { public: void set(const Iter& it, int i) { it_ = it; i_ = i; } bool operator<(const sort_idxtbl_pair& other) const { return *it_ < *other.it_; } operator int() const { return i_; } private: Iter it_; int i_; }; // This function is where most of the clarity savings came from; it has 5 lines, // where the original had 13. After each code line, I'll show the corresponding // original code for comparison. Prefer to write code that is clear and concise, // not unnecessarily complex or obscure! // template<class IterIn, class IterOut> void sort_idxtbl(IterIn first, IterIn last, IterOut out) { std::vector<sort_idxtbl_pair<IterIn> > v(last-first); // int iDst = last-first; // typedef std::vector< sort_idxtbl_pair<RAIter> > V; // V v(iDst); for(int i=0; i < last-first; ++i) v[i].set(first+i, i); // int i=0; // RAIter it = first; // V::iterator vit = v.begin(); // for (i=0; it<last; it++, vit++, i++) // (*vit).set(it,i); std::sort(v.begin(), v.end()); // std::sort(v.begin(), v.end()); std::copy(v.begin(), v.end(), out); // int *pi = pidxtbl; // vit = v.begin(); // for (; vit<v.end(); pi++, vit++) // *pi = (*vit).i; } } // Solution 2 uses a pair instead of reinventing a pair-like helper class. Now we're // down to 13 lines, from the original 23. Of the 14 lines, 9 are purpose-specific, // and 5 are directly reusable in other contexts. // namespace Solution2 { template<class T, class U> struct ComparePair1stDeref { bool operator()(const std::pair<T,U>& a, const std::pair<T,U>& b) const { return *a.first < *b.first; } }; template<class IterIn, class IterOut> void sort_idxtbl(IterIn first, IterIn last, IterOut out) { std::vector< std::pair<IterIn,int> > s(last-first); for(int i=0; i < s.size(); ++i) s[i] = std::make_pair(first+i, i); std::sort(s.begin(), s.end(), ComparePair1stDeref<IterIn,int>()); for(int i=0; i < s.size(); ++i, ++out) *out = s[i].second; } } // Solution 3 just shows a couple of alternative detailsit uses a map to avoid a // separate sorting step, and it uses std::transform() instead of a handcrafted loop. // Here we still have 15 lines, but more are reusable. This version uses more space // overhead and probably more time overhead too, so I prefer Solution 2, but this // is an example of finding alternative approaches to a problem. // namespace Solution3 { template<class T> struct CompareDeref { bool operator()(const T& a, const T& b) const { return *a < *b; } }; template<class T, class U> struct Pair2nd { const U& operator()(const std::pair<T,U>& a) const { return a.second; } }; template<class IterIn, class IterOut> void sort_idxtbl(IterIn first, IterIn last, IterOut out) { std::multimap<IterIn, int, CompareDeref<IterIn> > v; for(int i=0; first != last; ++i, ++first) v.insert(std::make_pair(first, i)); std::transform(v.begin(), v.end(), out, Pair2nd<IterIn const,int>()); } } // I left the test harness essentially unchanged, except to demonstrate putting // the output in an output iterator (instead of necessarily an int*) and using the // source array directly as a container. // #include <iostream> int main() { int ai[10] = { 15,12,13,14,18,11,10,17,16,19 }; std::cout << "#################" << std::endl; std::vector<int> aidxtbl(10); // use another namespace name to test a different solution Solution3::sort_idxtbl(ai, ai+10, aidxtbl.begin()); for(int i=0; i<10; ++i) std::cout << "i="<< i << ", aidxtbl[i]="<< aidxtbl[i] << ", ai[aidxtbl[i]]="<< ai[aidxtbl[i]] << std::endl; std::cout << "#################" << std::endl; }