Giving Closure to C
- Nested Functions
- Why Closures Are Hard
- Blocks
- C++0x Lambdas
- Should You Use Closures?
In the last couple of decades, Ruby programmers have started shouting about how closures are new and exciting (like all features of Ruby that other languages had decades earlier). But closures aren't a new concept in programming languagesLisp had them back in 1958. They've just started to enter mainstream awareness.
So what is a closure?
Closures are based on a mathematical concept from the lambda calculus worked out by Alonzo Church, who worked with Alan Turing back when computer "programs" were defined by how the components were wired together. Church's lambda calculus, which provided the inspiration for Lisp, was created as a way of reasoning about function applications. It's one of the earliest universal models of computing. A term in the lambda calculus might look something like this:
λx.yx
This represents a function that takes x as an argument and applies the function y to it. All terms in lambda calculus are functions, although some are constant functions (that is, functions that take no arguments and return a value), which are equivalent to simple values.
In the lambda term above, the value x is bound to the expression and is equivalent to a function argument. The value y is not passed as an argument, so it's a free variable. Before you can use this function, you must bind y to a value. A pure mathematical function is a lambda expression that has no free variablesits result depends entirely on its input.
In most programming languages, functions are allowed to refer to global variables. This lambda term might be a function in such a language, where the entire program would have a λy at the start, binding the global value y to the function. From here, it's a relatively small jump to seeing that the λy could be in a wider lambda term representing the entire program, with the scopes of the variables being nested. In simple terms, it's a function that can reference some variables outside of its lexical scopenot just local and global variables.
This is why languages like Lisp and Python use lambda as a keyword for creating closures. Somewhat confusingly, JavaScript uses function to define closures. In Smalltalk, they're informally called blocks, and are instances of the BlockClosure class.
Although C was created two decades after Lisp, it nonetheless lacks support for closures. The reason was simple: Closures don't map trivially to anything that the PDP-11 executes, and the point of C was to be a slightly abstract assembly-language substitute, rather than a high-level language.
Since then, C and C-derived languages, such as C++ and Objective-C, have been used for a lot more than low-level programming, and people have started to complain about the lack of closures. A lot of attempts have been made to add closures, and we'll look at some of those attempts in this article.
Nested Functions
One of the simplest attempts was GCC's nested-functions extension, which had one very large advantage: It didn't change the syntax of C. The only change to the grammar was allowing function definitions inside other functions. For example, you could define a function something like this:
void log(struct someStruct *s) { int visited = 0; void print(struct someStruct *s) { visited++; someStruct->print(); }; visit(s, print); }
The print() function refers to the visited variable, and a pointer to it is passed down to the visit() function as an argument. Unlike true closures, the inner function persists only as long as its enclosing scope. The visit() function may not store the function pointer anywhere (or very bad things will happen), but it may call this function, which can update the visited variable in the outer function.
The implementation of this extension is quite simple, but also quite dangerous. The print() function is compiled to a modified version of the function that has an extrahiddenargument, which is a pointer to the variables in the outer scope. That's pretty much how closures are implemented in any language, but there are some complications with C: The function pointer that is passed down to the visit() function doesn't have anywhere to store this hidden argument.
To get around this problem, the compiler generates some code that in turn generates a tiny function stub on the stack. The function stub contains a pointer to the on-stack address of the outer function's variables and then a small bit of code that loads this address and calls the real function.
The big security problem with this approach is that, because the stub is on the stack, it requires an executable stack. In theory, you could work around this hazard by using some malloc()'d memory and clean it up when the function returned, but I don't think anyone has tried it.
The other problem with this approach is that it only solves the so-called "downwards funarg problem": The pseudo-closure can be passed down the stack, but not up the stack. For example, you couldn't do something like this:
int(*)(void) getCounter() { int c = 0; int counter(void) { return c++; } return counter; }
If counter() were a true closure, you could return it and have each call to the getCounter() function return a new function that implemented a counter. With GCC nested functions, the first time you try to call the result of this function, you'll get some undefined behaviora crash if you're lucky, overwriting some random bits of stack memory if you're not.