- Locality of Declaration
- Returning a Class Object
- Initialization versus Assignment of Class Objects
Returning a Class Object
There was a small controversy in the late 1980s with regard to the perceived inability of the C++ language to return a class object efficiently. Although it is more efficient to return a class object by reference or pointer rather than by value, the language does not provide any direct support for that. For example, there is no simple alternative to returning the local Matrix object by value:
Matrix operator+( const Matrix& m1, const Matrix& m2 ) { Matrix result; // do the arithmetic ... return result; }
Although this revised implementation improves the operator's performance:
// more efficient, but results in a dangling reference Matrix& operator+( const Matrix& m1, const Matrix& m2 ) { Matrix result; // do the addition ... return result; }
It also results in a dangling reference because the local object is destructed prior to the return of the function.
We can solve the problem of the dangling reference by allocating the object on the heap, such as the following:
// No way to guarantee the memory is not lost Matrix& operator+( const Matrix& m1, const Matrix& m2 ) { Matrix *result = new Matrix; // do the addition ... return *result; }
This is likely to result in memory leakage, however.
In practice, the problem is not quite as severe as it seems. The class object is not passed back by value. Rather, it is internally passed back through an additional reference parameter the compiler adds to the function. For example, the Matrix addition operator is generally transformed as follows:
// Psuedo C++ Code: // general internal transformation of function // returning a class object by value void operator+( Matrix &_retvalue, const Matrix &m1, const Matrix &m2) { Matrix sum; // invoke default constructor sum.Matrix::Matrix(); // do the math // copy construct local sum object into _retvalue retvalue.Matrix::Matrix( sum ); }
The objection to this solution was based on the unnecessary invocation of the class copy constructor to initialize the additional parameter with a copy of the local class object. It would be more efficient, it was argued, if the function replaced all uses of the local class object with the parameter. The computations are applied directly to the parameter, eliminating the need both for the local class object and the copy constructor invocation. Here is the addition operator under this more aggressive transformation:
// Psuedo C++ Code: // A more aggressive internal transformation void operator+( Matrix &_retvalue, const Matrix &m1, const Matrix &m2) { // replace local object with object to be returned // construct and do the math directly with that object // invoke default constructor _retvalue.Matrix::Matrix(); // do the math // copy constructor no longer required }
How much of a difference can that additional transformation make? I did a test a few years back, using the UNIX timex command to measure the performance with and without the application of the additional transformations:
Not Applied : 1:48.52 Applied : 46.73
One proposed solution was a language extension, in which the class object to hold the return value was named following the parameter list of the function:
Matrix operator+( const Matrix& m1, const Matrix& m2 ) name result // proposed language extension { // no longer write: Matrix result; //... // no longer write: return result; }
The language extension never became part of the language, but the optimization itself has become nearly universal. It is called the name return value optimization. It is applied implicitly by a compiler when the same named object is returned from all the exit points of a function.
Because it is an optimization rather than explicitly defined within the language, however, a compiler is not required to apply it. (At the time of this writing, I am aware of only two compilers that do not provide the named return value optimization: the GNU compiler gcc and Visual C++.)
To test whether your compiler applies it, run the following simple program:
class foobar { public: foobar() { cout << "foobar::foobar()\n"; } ~foobar() { cout << "foobar::~foobar()\n"; } foobar( const foobar &rhs ) { cout << "foobar::foobar( const foobar & )\n"; } foobar& operator=( const foobar &rhs ) { cout << "foobar::operator=( const foobar & )\n"; } void ival( int nval ) { _ival = nval; } private: int _ival; }; foobar f( int val ) { foobar local; local.ival( val ); return local; } int main(){ foobar ml = f( 1024 ); return 0; }
If the compiler does not apply the optimization, there are two constructor calls (the construction of the local object and the copy construction of the return object) and two destructor calls as follows (this is generated by the GNU C++ compiler, gcc, running under Linux):
{stan@andromeda} : gcc nrv.cc -lstdc++ && a.out foobar::foobar() foobar::foobar( const foobar & ) foobar::~foobar() foobar::~foobar()
By removing the local named object and computing the results directly within the named object, the copy constructor call and one of the destructor calls are eliminated. For example, here is the output of the same program with the optimization applied (this is generated by the KAI C++ compiler running under Linux):
{stan@andromeda} : KCC nrv.cc && a.out foobar::foobar() foobar::~foobar()
So, what if your compiler does not apply the name return value optimization? An alternative programming idiom is to provide a "computational constructor" that does the work previously done in the actual function, and to return that within the function. Here's a simple example using our foobar class:
class foobar { public: // simple example of computational constructor ... foobar( int val ) : _ival( val ) { cout << "foobar::foobar( " << val << " )\n"; } // ... }; /* previous version using local object and 'computation' foobar f( int val ) { foobar local; local.ival( val ); return local; } */ // revised version invoking the 'computational' constructor foobar f( int val ) { return foobar( 1024 ); }
In effect, we've hand-optimized the function: the local object is removed, and the computation is applied directly to the constructed object. The compiler completes the optimization by applying the constructor to the class reference parameter introduced to hold the result:
// C++ Pseudo Code: // internal transformation of our hand-optimized function void f( foobar & _result, int val ) { _result.foobar::foobar( 1024 ); }
When compiled and executed, our revised program mimics the result of having the name return value optimization applied:
foobar::foobar( 1024 ) foobar::~foobar()