It's Not a Graphics Accelerator
Over the last few years, one word has dominated GPU marketing: shaders. A shader is a small program that runs on the GPU. The two modern 3D APIs, DirectX 11 and OpenGL 3.1, are all about shaders. While older graphics cards focused on implementing parts of the fixed pipeline from OpenGL, newer ones focused on allowing developers to replace parts of it. With the latest APIs, there is no fixed-function pipeline, just a sequence of shader programs that are first run on the vertexes and then on the fragments. The drivers for the older APIs typically implement the fixed-function pipeline in software as shader programs.
The idea that a CPU is a general-purpose processor and a GPU is a special-purpose processor is slightly misleading. In reality, there is no such thing as a general-purpose processor. When you start designing a processor, you start by profiling the kind of application code that you expect it to run and find the frequency of different types of instruction. You then allocate transistor space proportional to the frequency of instructions. One early attempt at doing this was the Berkeley RISC project. They discovered that code generated by compilers tended to use a small subset of instructions, so they could simplify the decoder a lot by only supporting this subset. They also found that function calls were quite common in code compiled from high-level languages, so they optimized for this.
One rule of thumb that has remained roughly constant from the Berkeley RISC era to the present is that most code contains a branch roughly every seven instructions. This is why a huge amount of effort is spent on branch prediction. The kind of code that typically runs on a GPU has very different characteristics. It sometimes doesn't contain any branches, and if it does they are typically a few hundred instructions apart. A modern GPU is just as much a general-purpose processor as a modern CPU, it's just optimized for a different subset of all algorithms.
The requirements for a driver for such hardware are very different. With older, fixed-function GPUs, the obvious way to implement a driver was to provide a software OpenGL implementation and just replace the parts that are implemented in hardware with code that sent the relevant commands to the hardware. This is more or less what happened with DRI drivers. It wasn't ideal, because it involved a lot of code duplication between drivers, but it worked.
The code duplication was in the form of the OpenGL state tracker. OpenGL, like a lot of graphics APIs, has a shared context which persists between calls. You modify the statefor example, changing the current colorthen you send some other commands. The hardware may or may not have a similar internal state. If it doesn't, then the driver has to keep track of the OpenGL state. You don't need a full OpenGL implementation in each driver, but there's still a lot of code that ought to be shared but isn't.
The flip side of this code duplication is that it ties the drivers to the OpenGL model. I mentioned earlier that things like the XRender extension benefit a lot from acceleration, but implementing this support is nontrivial on top of existing OpenGL-oriented drivers.