Memory Hygiene in C and C++ Part 3: Safe Programming with Risky Data
In the first two installments in this series, I emphasized how crucial memory management is for C and C++ development—arguably our most difficult single coding job. I've also discussed the most prominent techniques for engineering it safely: programming discipline, and commercial memory-debugging tools. This time, let's look at a few of the memory debuggers which are available at no charge, along with other memory techniques you can try on your own.
Quite a few free tools are available for memory debugging, as I have begun to document at my site. Two sophisticated ones appear currently to be in wide use: mpatrol and Valgrind. While the latter is both more ambitious and active, mpatrol currently is more portable; if you're working on an embedded processor, a mainframe, or anything that's not a conventional desktop or server, mpatrol is likelier to work "out of the box."
The examples in this article are based on mpatrol. Keep in mind that you can achieve essentially all the same results with Valgrind.
Command-line Power
mpatrol is a command-line tool. One value of the commercial tools (mentioned in the previous article) bring is their inviting point-and-click interfaces. It's not hard to use mpatrol, though, and its freely-available 264-page manual provides more than enough guidance to reach success. The examples which follow were run on a Debian Linux with mpatrol 2.7 installed from its Debian package.
mpatrol is constructed as an alternative memory allocator.
Rather than using the conventional malloc()
or new[]
run-time libraries, mpatrol intercepts their invocations and diagnoses memory use at run-time. The most portable way to do this requires recompilation of sources with an #include
"mpatrol.h"
.
For a first experiment, create the source file
example1.c
:
/* * Allocates a block of 16 bytes and then attempts to free the * memory returned at an offset of 1 byte into the block. */ #include "mpatrol.h" #includeint main(void) { char *p; if (p = (char *) malloc(16)) free(p + 1); return EXIT_SUCCESS; }
And compile it:
cc -o example1 example1.c -lmpatrol -lelf
The corresponding Windows invocation is:
cl -I/mpatrol/include -Zi example1.c -link \ -libpath:/mpatrol/lib -defaultlib:libmpatrol \ -defaultlib:imagehlp -pdb:none
Other varieties of Unix might require minor adjustments in the referenced libraries.
Use of mpatrol depends on assignment of an environment variable. To begin,
export MPATROL_OPTIONS=LOGALL
Now, execute the instrumented program:
./example1
It completes without incident, and exits with EXIT_SUCCESS. That models a superficially healthy program.
mpatrol knows better, though. If you look in the mpatrol.log
file created in the local directory, you'll see:
@(#) mpatrol 1.4.8 (02/01/08) Copyright (C) 1997-2002 Graeme S. Roy . . . Log file generated on Sun Feb 29 03:35:54 2004 . . . ALLOC: malloc (25, 16 bytes, 4 bytes) [main|ex1.c|11] 0x08048520 main+92 0x4007ADC6 __libc_start_main+198 0x08048421 _start+33 returns 0x0804A1C4 FREE: free (0x0804A1C5) [main|ex1.c|12] 0x0804855E main+154 0x4007ADC6 __libc_start_main+198 0x08048421 _start+33 ERROR: [MISMAT]: free: 0x0804A1C5 does not match allocation of 0x0804A1C4 0x0804A1C4 (16 bytes) {malloc:25:0} [main|ex1.c|11] 0x08048520 main+92 0x4007ADC6 __libc_start_main+198 0x08048421 _start+33 . . .
This is mpatrol's way of saying our free()
doesn't line up with our malloc()
, an error that can be extremely difficult to isolate (or even recognize) with conventional debugging.
mpatrol is similarly descriptive for other classes of errors: writing into unallocated memory, duplicate freeing, use of freed memory, and so on. The manual provides exceptionally clear explanations of all the information available in an mpatrol.log
, and its meaning in terms of C source code.
Industrial Strength
All of the better memory-testing tools, both proprietary and free, advertise roughly the same capabilities to detect heap corruption, memory leaks, and so on. Their specific value to you will appear in a couple of different stages. It's easy enough to try out a memory debugger and get a "first impression"—whether the product's style suits you, and especially whether you find its error reports readable. That first impression is also valuable; if a product doesn't feel right to you, it's unlikely to benefit you much, and you won't be comfortable using it.
Working with "toy" programs and tutorial instances, though, reveals only part of what these tools can do for you. Graeme S. Roy, mpatrol's creator, invented it to help with difficult problems in large, serious C++ applications, and a number of mpatrol features serve that motivation.
Paramount among these is mpatrol's intelligence in regard to recompilation. example1
, above, required compilation with #include "mpatrol.h"
insertions. This is, indeed, necessary for general mpatrol use.
Modification of potentially hundreds of source files is an unnecessary handicap, though, in many environments. Most dynamically linked Windows and Linux programs do not have to be recompiled. It's only necessary to relink against the mpatrol library, without recompilation.
I can hardly overstate what a difference this makes. It means that, in a consulting situation, I can walk into a development lab "cold," and, in just a few minutes, know what's going on. I use all the object files the organization makes for its production runs to produce a modified executable, which immediately gives me memory diagnostics. I often am able to spot problems even without debugging turned on at the compiler level; narrowing a memory problem down to a specific function is frequently enough for an effective diagnosis.
For more on this subject, see the dynamic linkage section of InformIT's C++ Guide.
Examplary Case Study
Memory debugging tools, such as mpatrol, multiply my productivity. One of the surprises in software is that these tools are not used more widely—at least, it surprises me. After considering this puzzle for over a decade, one of my conclusions is that memory debuggers are inherently "intermediate" tools that leverage experience, without substituting for it. Memory errors are so often subtle or compound that even a direct pointer to them yields solutions only when combined with expertise. Even novices can push nails in with hammers, but the same beginners don't have the background to get a good cross-cut handsaw moving.
A recent case illustrated this point for me. A large, complex program had intermittent faults. It took little time for mpatrol to point to a suspicious
memcpy(complicated_union, graphics_data, data_length)
The most typical mistake in use of memcpy()
is overwriting the space allocated at the function's destination address. That wasn't the case this time. The memcpy()
was a bit of a performance hack, a slightly opaque coding at a junction of database, networking, and GUI modules, where bytes had to be pumped as fast as possible.
mpatrol diagnostics and deeper investigation showed what was really happening: in unusual circumstances, data_length
was longer than it should have been, so that the memcpy
transferred "bad data" following the real content of graphics_data
into complicated_union
. That itself was only a problem in rare cases, depending on the semantics of complicated_union
.
The fix called for a refactoring. While it was tempting just to recompute data_length, the longevity of its miscalculation was a symptom of deeper confusions. The right approach was to clean up the data structures and associated operations so that it was obvious what they should do. We needed to rewrite four functions to simplify their interfaces. What I eventually realized was that there was no way to move automatically from the memory diagnostic to the refactoring. While even the less experienced coders believed that something was wrong, there were too many intermediate steps for them to recognize the correct remedy on their own.
That reinforces the importance of working through a memory debugger's tutorial on your own. There seems to be a lot of individual variation in how differently good programmers respond to the messages from various tools. Try out mpatrol and one or two others, and see how well they "speak" to you.
Accuracy and Speed
A second notable aspect of mpatrol's design has to do with
performance. Does that surprise you? So far in this series, I've
presented memory hygiene as a negative sort of virtue: use memory
correctly, or bad things—insecurity, unreliability, and
so on—will happen. That can be a tough sell in many
organizations. I'm certain that correction of the memcpy
misuse I just described saved the organization that owns that code at least $10,000 in customer disappointment and support time; it can be hard, though, to remember the value of prevention.
Easier for some decision-makers to understand are performance improvements. mpatrol has a direct role to play, there, too.
First, understand that use of mpatrol hurts performance, sometimes dramatically, in the "short term." Linking against mpatrol libraries slows down an application quite a bit. This is another reason I suspect many organizations don't take advantage of memory debuggers: their configuration management isn't up to juggling one version of executables for memory debugging, and another for performance-testing and delivery to customers.
The information mpatrol gathers while slowing down one particular run can be invaluable, though. Many applications coded in C and especially in C++ spend a significant portion of their runtime allocating and deallocating memory. While the system-level memory managers found in conventional run-time libraries are good for general-purpose use, custom memory managers can improve the perceived performance of more applications than even most programmers recognize.
The mpatrol manual includes a good chapter on this subject; you'll also want to read InformIT's guide to Memory Management.
Memory profiling for performance can lead to at least two distinct kinds of improvement: rewrites of the application code, to use memory more efficiently; and substitution of a customized memory allocator for the default one. While choices between these are an advanced topic, keep in mind that mpatrol analysis has a place in both kinds of tuning.
The first three articles of this series have mostly lumped together C and C++ in examining the advantages and limits of memory debuggers. The final installment will concentrate on aspects that are specific to C++: whether C++ is enough of an improvement on C to obviate the need for memory debuggers, what's special about C++ memory use, and how to code in C++ to avoid memory errors.