Memory Woes
The C memory model divides memory into two locations—the heap and the stack. Memory on the heap is allocated and freed manually. Memory on the stack is lexically scoped; it's allocated automatically when a block is entered and freed when the block is left.
This practice leads to problems when passing data back to a calling function. For structures, the obvious thing to do is simply return the structure. When this is compiled, the caller is responsible for allocating the structure and passes a pointer to the callee, which then copies the data into this space. If you pass a structure back through a lot of function calls, however, this results in a lot of copying.
Worse, this doesn't work at all for dynamically sized data. Consider something like sprintf(). This analog of printf() writes to a buffer rather than to the standard output. The problem with sprintf is that the caller needs to know the length of the buffer, which isn't always easy—in fact, it requires implementing a lot of sprintf in the caller.
In fact, it's almost impossible to use sprintf safely. You need to specify lengths for every single element in your format string. (This is why snprintf was created.) This can result in truncation (which can also cause security problems, and we'll come back to that later), so some libc implementations also include asprintf().
The asprintf() function is like sprintf, except that it allocates its own buffer using malloc. This prevents suffering from truncation or overflow. Unfortunately, it exposes a new problem: When should the caller free pointers returned by the callee?
A lot of C code has this problem. The solution is typically to put "the caller must free the returned pointer" in the documentation for a function. Unfortunately, this practice makes it very difficult to glance at a piece of code and see whether it's valid. One solution is to wrap every pointer return in something like this:
typedef struct _RefCountedBuffer { void *buffer; int refcount; void (free*)(struct _RefCountedBuffer*); } *RefCountedBuffer;
When you return a pointer, you create one of these structures and set refcount to 1. When you receive one, you always call a function that decrements the refcount and calls the associated free() function if it reaches zero.
This technique will be familiar to Objective-C programmers, because OpenStep implements a similar mechanism. GNUstep goes one step further, providing ASSIGN() and DESTROY() macros. These macros make memory bugs a lot more rare, and we can do something similar in pure C.
First, we need to define retain() and release() functions:
RefCountedBuffer retain(RefCountedBuffer *buf) { buffer->refcount++; return buffer; } void release(RefCountedBuffer *buf) { buf->refcount--; if (buf->refcount == 0) buf->free(buf); }
Note that these are simplified versions of the functions you would really need. The most obvious problem is that they're not thread-safe. The ++ and -- operators are usually compiled down to a sequence of load, add (or subtract), store. If two threads to a retain at once (for example), they might both do the load before either has done the store, and one retain will be lost. You can get around this problem by using CPU-specific assembly language or GCC intrinsics for atomic operations.
Once you have these functions defined, you can define SET and FREE macros like this:
#define SET(var, val) do { RefCountedBuffer __tmp = retain(val); if (NULL != var) release(var) var = __tmp; } while(0)
Note that you retain the new value before releasing the old one, which avoids problems in the case where the new and old values are the same. The corresponding FREE() macro is quite simple:
#define FREE(var) do { release(var) var = NULL; } while(0)
This macro ensures that every pointer is always set to NULL after freeing it. Even if you don't use reference counting, this practice is worthwhile.
If you use both of these macros, you'll have very few equals signs in your code. This makes it easy to look through and find cases where you might have memory-related bugs.
Reference counting is a good solution for read-only data. You can return some internal component of a large data structure as a reference, not deleting it from the original structure until it's finished in both uses—as long as it's wrapped in a RefCountedBuffer structure in both cases.
This design doesn't solve our original problem of something like asprintf, however. It wants to return a string that's often used only in the caller. For this, allocating on the heap and wrapping in a reference counting structure is overkill. What you want is a way of allocating the space in the caller's stack frame.
The designers of the Dovecot IMAP server have an elegant solution. In addition to the normal control stack, they provide a separate data stack and an analog of asprintf that uses it. The caller first calls a function that allocates a new data stack frame. It then calls the asprintf() analog, which allocates space on the data stack for the result and returns. This remains valid until the caller pops the top data stack frame.
Because the control and data stacks move at different speeds, you can easily return data on the data stack. As long as the function that created a data stack frame also destroys it, you have no problems. The flow for something like asprintf() is to first create a data stack frame, and then call asprintf(), which allocates space in the current data stack frame and puts the result into it. You use the result and then pop the current stack frame.
Nothing stops you from having multiple independent memory regions. You can use mmap() on /dev/zero to allocate a contiguous blob of memory somewhere and do whatever you want in it. One possible idea is to allocate a data stack that moves at the same speed as the control stack. Use this trick for all arrays that you would otherwise allocate on the stack. Unlike the control stack, this can grow upward in memory. You can make it relocatable by always addressing it via a global pointer to the start; for example, with macros like these:
#define NEW_ARRAY(type, name, elements) __attribute__((cleanup(popDataStack))) type *name = dataStackCalloc(sizeof(type), elements)
The __attribute__(cleanup) part is a GCC extension. It causes the popDataStack() function to be called with a pointer to the variable as the argument when the variable goes out of scope. You now have a pointer to something in the data stack. Rather than employing direct access, you might have macros that add this address to a pointer. This setup allows the size of your data stack to keep growing until you don't have enough contiguous memory free to store all of it.
You can still run over the end of an array, however. You won't clobber a return address, but you'll potentially overwrite some data or have other problems. You can avoid going over the end of the data stack by using the mprotect() call on the last page in the stack to remove all access rights. Most malloc() implementations have a debugging mode that inserts this sort of guard page after every allocation. You can do this by making your data stack a stripe of accessible and inaccessible pages, with every array allocated so that it ends just before an inaccessible page, but this system is quite costly. A sensible operating system won't allocate any real memory for the guard pages, but you'll waste a lot of space in the gaps from the start of pages to the start of arrays, and you'll use a lot of address space.