2.5 Data Abstraction
Modularity is a fundamental aspect of all successful large programs. However, modules in the form described previously are not sufficient to express complex systems cleanly. Here, I first present a way of using modules to provide a form of user-defined types and then show how to overcome some problems with that approach by defining user-defined types directly.
2.5.1 Modules Defining Types
Programming with modules leads to the centralization of all data of a type under the control of a type manager module. For example, if we wanted many stacksrather than the single one provided by the Stack module abovewe could define a stack manager with an interface like this:
namespace Stack { struct Rep; // definition of stack layout is elsewhere typedef Rep& stack; stack create(); // make a new stack void destroy(stack s); // delete s void push(stack s, char c); // push c onto s char pop(stack s); // pop s
The declaration
struct Rep;
says that Rep is the name of a type, but it leaves the type to be defined later. The declaration
typedef Rep& stack;
gives the name stack to a "reference to Rep." The idea is that a stack is identified by its Stack::stack and that further details are hidden from users.
A Stack::stack acts much like a variable of a built-in type:
struct Bad_pop { }; void f() { Stack::stack s1 = Stack::create(); // make a new stack Stack::stack s2 = Stack::create(); // make another new stack Stack::push(s1,'c'); Stack::push(s2,'k'); if (Stack::pop(s1) != 'c') throw Bad_pop(); if (Stack::pop(s2) != 'k') throw Bad_pop(); Stack::destroy(s1); Stack::destroy(s2); }
We could implement this Stack in several ways. It is important that a user doesn't need to know how we do it. As long as we keep the interface unchanged, a user will not be affected if we decide to re-implement Stack.
An implementation might preallocate a few stack representations and let Stack::create() hand out a reference to an unused one. Stack::destroy() could then mark a representation "unused" so that Stack::create() can recycle it:
namespace Stack { // representation const int max_size = 200; struct Rep { char v[max_size]; int top; }; const int max = 16; // maximum number of stacks Rep stacks[max]; // preallocated stack representations bool used[max]; // used[i] is true if stacks[i] is in use typedef Rep& stack; } void Stack::push(stack s, char c) { /* check s for overflow and push c */ } char Stack::pop(stack s) { /* check s for underflow and pop */ } Stack::stack Stack::create() { // pick an unused Rep, mark it used, initialize it, and return a reference to it } void Stack::destroy(stack s) { /* mark s unused */ }
What we have done is to wrap a set of interface functions around the representation type. How the resulting "stack type" behaves depends partly on how we defined these interface functions, partly on how we presented the representation type to the users of Stacks, and partly on the design of the representation type itself.
This is often less than ideal. A significant problem is that the presentation of such "fake types" to the users can vary greatly depending on the details of the representation typeand users ought to be insulated from knowledge of the representation type. For example, had we chosen to use a more elaborate data structure to identify a stack, the rules for assignment and initialization of Stack::stacks would have changed dramatically. This may indeed be desirable at times. However, it shows that we have simply moved the problem of providing convenient stacks from the Stack module to the Stack::stack representation type.
More fundamentally, user-defined types implemented through a module providing access to an implementation type don't behave like built-in types and receive less and different support than do built-in types. For example, the time that a Stack::Rep can be used is controlled through Stack::create() and Stack::destroy() rather than by the usual language rules.
2.5.2 User-Defined Types
C++ attacks this problem by allowing a user to directly define types that behave in (nearly) the same way as built-in types. Such a type is often called an abstract data type. I prefer the term user-defined type. A more reasonable definition of abstract data type would require a mathematical "abstract" specification. Given such a specification, what are called types here would be concrete examples of such truly abstract entities. The programming paradigm becomes this:
Decide which types you want; provide a full set of operations for each type.
Where there is no need for more than one object of a type, the data-hiding programming style using modules suffices.
Arithmetic types such as rational and complex numbers are common examples of user-defined types. Consider:
class complex { double re, im; public: complex(double r, double i) { re=r; im=i; } // construct complex from two scalars complex(double r) { re=r; im=0; } // construct complex from one scalar complex() { re = im = 0; } // default complex: (0,0) friend complex operator+(complex, complex); friend complex operator-(complex, complex); // binary friend complex operator-(complex); // unary friend complex operator*(complex, complex); friend complex operator/(complex, complex); friend bool operator==(complex, complex); // equal friend bool operator!=(complex, complex); // not equal // ... };
The declaration of class (that is, user-defined type) complex specifies the representation of a complex number and the set of operations on a complex number. The representation is private; that is, re and im are accessible only to the functions specified in the declaration of class complex. Such functions can be defined like this:
complex operator+(complex a1, complex a2) { return complex(a1.re+a2.re,a1.im+a2.im); }
A member function with the same name as its class is called a constructor. A constructor defines a way to initialize an object of its class. Class complex provides three constructors. One makes a complex from a double, another takes a pair of doubles, and the third makes a complex with a default value.
Class complex can be used like this:
void f(complex z) { complex a = 2.3; complex b = 1/a; complex c = a+b*complex(1,2.3); // ... if (c != b) c = -(b/a)+2*b; }
The compiler converts operators involving complex numbers into appropriate function calls. For example, c!=b means operator!=(c,b) and 1/a means operator/(complex(1) ,a).
Most, but not all, modules are better expressed as user-defined types.
2.5.3 Concrete Types
User-defined types can be designed to meet a wide variety of needs. Consider a user-defined Stack type along the lines of the complex type. To make the example a bit more realistic, this Stack type is defined to take its number of elements as an argument:
class Stack { char* v; int top; int max_size; public: class Underflow { }; // used as exception class Overflow { }; // used as exception class Bad_size { }; // used as exception Stack(int s); // constructor ˜Stack(); // destructor void push(char c); char pop(); };
The constructor Stack(int) will be called whenever an object of the class is created. This takes care of initialization. If any cleanup is needed when an object of the class goes out of scope, a complement to the constructorcalled the destructorcan be declared:
Stack::Stack(int s) // constructor { top = 0; if (s<0 || 10000<s) throw Bad_size(); // "||" means "or" max_size = s; v = new char[s]; // allocate elements on the free store (heap, dynamic store) } Stack::˜Stack() // destructor { delete[] v; // free the elements for possible reuse of their space }
The constructor initializes a new Stack variable. To do so, it allocates some memory on the free store (also called the heap or dynamic store) using the new operator. The destructor cleans up by freeing that memory. This is all done without intervention by users of Stacks. The users simply create and use Stacks much as they would variables of built-in types. For example:
Stack s_var1(10); // global stack with 10 elements void f(Stack& s_ref, int i) // reference to Stack { Stack s_var2(i); // local stack with i elements Stack* s_ptr = new Stack(20); // pointer to Stack allocated on free store s_var1.push('a'); // access through variable s_var2.push('b'); s_ref.push('c'); // access through reference s_ptr->push('d'); // access through pointer // ... }
This Stack type obeys the same rules for naming, scope, allocation, lifetime, etc., as does a built-in type such as int and char.
Naturally, the push() and pop() member functions must also be defined somewhere:
void Stack::push(char c) { if (top == max_size) throw Overflow(); v[top] = c; top = top + 1; } char Stack::pop() { if (top == 0) throw Underflow(); top = top - 1; return v[top]; }
Types such as complex and Stack are called concrete types, in contrast to abstract types, where the interface more completely insulates a user from implementation details.
2.5.4 Abstract Types
One property was lost in the transition from Stack as a "fake type" implemented by a module (§2.5.1) to a proper type (§2.5.3). The representation is not decoupled from the user interface; rather, it is a part of what would be included in a program fragment using Stacks. The representation is private, and therefore accessible only through the member functions, but it is present. If it changes in any significant way, a user must recompile. This is the price to pay for having concrete types behave exactly like built-in types. In particular, we cannot have genuine local variables of a type without knowing the size of the type's representation.
For types that don't change often, and where local variables provide much-needed clarity and efficiency, this is acceptable and often ideal. However, if we want to completely isolate users of a stack from changes to its implementation, this last Stack is insufficient. Then, the solution is to decouple the interface from the representation and give up genuine local variables.
First, we define the interface:
class Stack { public: class Underflow { }; // used as exception class Overflow { }; // used as exception virtual void push(char c) = 0; virtual char pop() = 0; };
The word virtual means "may be redefined later in a class derived from this one" in Simula and C++. A class derived from Stack provides an implementation for the Stack interface. The curious =0 syntax says that some class derived from Stack must define the function. Thus, this Stack can serve as the interface to any class that implements its push() and pop() functions.
This Stack could be used like this:
void f(Stack& s_ref) { s_ref.push('c'); if (s_ref.pop() != 'c') throw Bad_pop(); }
Note how f() uses the Stack interface in complete ignorance of implementation details. A class that provides the interface to a variety of other classes is often called a polymorphic type.
Not surprisingly, the implementation could consist of everything from the concrete class Stack that we left out of the interface Stack:
class Array_stack : public Stack { // Array_stack implements Stack char* p; int max_size; int top; public: Array_stack(int s); ~Array_stack(); void push(char c); char pop(); };
The ":public" can be read as "is derived from," "implements," and "is a subtype of."
For a function like f() to use a Stack in complete ignorance of implementation details, some other function will have to make an object on which it can operate. For example:
void g() { Array_stack as(200); f(as); }
Since f() doesn't know about Array_stacks but only knows the Stack interface, it will work just as well for a different implementation of a Stack. For example:
class List_stack : public Stack { // List_stack implements Stack list<char> lc; // (standard library) list of characters public: List_stack() { } void push(char c) { lc.push_front(c); } char pop(); }; char List_stack::pop() { char x = lc.front(); // get first element lc.pop_front(); // remove first element return x; }
Here, the representation is a list of characters. The lc.push_front(c) adds c as the first element of lc, the call lc.pop_front() removes the first element, and lc.front() denotes lc's first element.
A function can create a List_stack and have f() use it:
void h() { List_stack ls; f(ls); }
2.5.5 Virtual Functions
How is the call s_ref.pop() in f() resolved to the right function definition? When f() is called from h(), List_stack::pop() must be called. When f() is called from g(), Array_stack::pop() must be called. To achieve this resolution, a Stack object must contain information to indicate the function to be called at run-time. A common implementation technique is for the compiler to convert the name of a virtual function into an index into a table of pointers to functions. That table is usually called "a virtual function table" or simply, a vtbl. Each class with virtual functions has its own vtbl identifying its virtual functions. This can be represented graphically like this:
The functions in the vtbl allow the object to be used correctly even when the size of the object and the layout of its data are unknown to the caller. All the caller needs to know is the location of the vtbl in a Stack and the index used for each virtual function. This virtual call mechanism can be made essentially as efficient as the "normal function call" mechanism. Its space overhead is one pointer in each object of a class with virtual functions plus one vtbl for each such class.