C Digression
How would the same problem be solved in C, where there are no virtual functions? A good way would be to try to implement a virtual table by hand and make calls through function pointers. You'd be surprised how often this is done. It's called "object-oriented programming in C." The more traditional approach would be to define a struct Node that has enough fields to accommodate the needs of binary operators as well as numeric nodes. The identity of the node would be stored in a separate field (see Listing 4).
Listing 4 C Implementation of a Polymorphic Node
#define NUM_NODE 1 #define ADD_NODE 2 #define MULT_NODE 3 struct Node { /* Node type */ int type; /* Used only in numeric nodes */ double val; /* Used only in binary nodes */ struct Node * pLeft; struct Node * pRight; };
Virtual functions in C++ would be turned into multi-way conditionals, like this switch statement:
double Calc (struct Node * pNode) { double x; x = 0.0; switch (pNode->type) { case NUM_NODE: x = pNode->val; break; case ADD_NODE: x = Calc (pNode->pLeft) + Calc (pNode->pRight); break; case MULT_NODE: x = Calc (pNode->pLeft) * Calc (pNode->pRight); break; } return x; }
Obviously, this solution would be too expensive when applied to trees with hundreds of nodes. This is when a C programmer would fetch his or her powerful secret weapon. When the going gets tough, type safety is the first to gouse casts. (I'm being sarcastic here.) Instead of packing everything into a single struct, the programmer creates a variety of structs (see Listing 5):
Listing 5 A Less Wasteful C Implementation of Polymorphic Nodes
struct Node { int type; }; struct NumNode { int type; double val; }; struct BinNode { int type; struct Node * pLeft; struct Node * pRight; };
Some would even use a byte to store typetrading speed for size (on many processors, fetching a word-aligned value is faster than fetching a byte-aligned value). The function Calc takes a pointer to Node and, based on its type field, casts it to the appropriate pointer (see Listing 6). In fact, you could even get rid of the last pretense of type safety and pass Calc a void pointer.
Listing 6 Extensive Casting in the C Implementation of Calc
double Calc (struct Node * pNode) { double x = 0.0; switch (pNode->type) { case NUM_NODE: x =((struct NumNode *)pNode)->val; break; case ADD_NODE: { struct BinNode * pN = (struct BinNode *) pNode; x = Calc (pN->pLeft) + Calc (pN->pRight); break; } case MULT_NODE: { struct BinNode * pN = (struct BinNode *) pNode; x = Calc (pN->pLeft) * Calc (pN->pRight); break; } default: printf ("Bad node type\n"); } return x; }
I haven't explained casting yet, and I'm very reluctant to do it at this stage. Casting means cheating the compiler. You have an object that's declared as a pointer to Node and you're telling the compiler to treat it as a pointer to BinNode. This way, you're bypassing the compiler's elaborate mechanism of type checking. Now it's up to you, the programmer, to make sure that the pointer in question actually points to BinNode and not something else.
Notice that even the "constructor"CreateNumNodehas to use casts. The destructor would use both casts and conditionals:
struct Node * CreateNumNode (double value) { struct NumNode * pNode = malloc (sizeof (struct NumNode)); pNode->type = NUM_NODE; pNode->val = value; return (struct Node *) pNode; }
How do these C solutions compare with the C++ polymorphic approach? As far as the size of the data structures is concerned, C doesn't offer much in the way of savings. In fact, the first C solution consumes substantially more memory.
As far as the speed of execution is concerned, we have to compare a multi-way conditional plus a direct call against a doubly indirect call through the vtable. The result of the comparison depends on how well the compiler optimizes conditionals. In our case, since the case labels are consecutive, the compiler will probably create a jump table indexed by the value of type. Before doing an indirect jump, the compiler will have to check whether the index is within bounds (if it's not, the compiler will jump to the default label). So in fact we have a conditional, an indirect jump, and a function call. In C++ we have a doubly indirect function call. The C++ compiler usually optimizes the passing of the this pointer by putting it into a register, so the calling sequence may be simpler in C++. The bottom line is that without actually timing both versions, there's no way to tell which one will be faster.
As far as correctness and ease of maintenance is concerned, just consider what it would be like to add a new type of node to each of the implementations. Notice also that, even though we've been writing progressively more complex programs in C++, we haven't felt the need for either casts or switch statements.
Even C programmers will admit that using casts should be a last resort. C++ programmers feel the same about switch statements: Use switch statements only when polymorphism cannot be used.
The parse tree presented here will be used in a later article describing the implementation of a symbolic calculator.