The State of Open Source 3D
Most people using UNIXor computers in generalback when it was new didn't use a graphical display. They connected via a dumb terminal, a simple device that pretended to be a teletype, with a text-only display and a keyboard.
Early graphical workstations appeared a decade or so later. The graphics hardware in these machines was simple: a frame buffer. Every pixel on the screen had a corresponding memory address. Because memory was so expensive, these often used a palette rather than true color. With a true-color display, each memory address stores a color value. Typically, this requires 3 bytes per pixel (24-bit color), but some modern hardware uses more. With a palletized display, each pixel value stores an index in a lookup table, which stores the real color value. If you have a 16-color display, then each pixel needs just 4 bits of memory.
The display hardware would read the pixel values, in turn, and would feed these values (or the color values calculated by looking up the index in the palette) to a digital-to-analog converter which drove the CRT. A lot of tricks were used to let you write to the memory in the display while it was being sent to the screen. Later, when memory became a bit cheaper, it was common for graphics cards to support double-buffering, where you drew to one frame buffer while another was drawn, then swapped them around.
If you ever wrote a game for MS-DOS, you've probably programmed hardware about as complicated as this. A device driver for such a display was a very simple piece of code. It needed to send a few commands to the hardware to set up the resolution, but after that it just mapped the frame buffer memory into some process's address space and let it write pixels to the screen directly.
The Windows Accelerator
These simple frame buffers occasionally came with some accelerated features. One of the most common, the BitBlt (pronounced bit-blit), originally appeared on the Xerox Alto, invented by Dan Ingalls, to make Smalltalk run fast enough. This was a very simple operation, copying a range of memory addresses into the frame buffer with a single command. Variants of these were often used for sprite acceleration, where a region of memory (and, optionally, a mask) was assigned a unique identifier and could then be drawn very quickly to the screen.
The next evolution of graphics hardware was designed for running primitive windowing systems quickly. Most windowing systems draw a lot of lines, so the hardware developed commands to draw these, rather than requiring the CPU to write them directly.
On UNIX workstations, the accelerated functions usually corresponded closely to the drawing primitives available in the windowing system. When you draw a line in X11, you typically made a call to the XLib XDrawLine() function. This sent a DrawLine message to the server, which would then handle the drawing. If possible, this would then issue a single command to the graphics hardware to draw the line.
Drivers in this era became a lot more complicated. Every graphics chip could handle some subset of the X11 drawing commands, and simple frame buffers didn't accelerate any of them. Every driver had to advertise the commands that it could accelerate, and the general-purpose code had to handle the case where particular commands needed to be handled in software.
One of the changes in a modern X server's driver model is to dispense with most of this layer. 2D acceleration, at least in terms of line drawing and similar operations, is now irrelevant. Doing these entirely in software, even on a relatively old computer, is fast enough and, more importantly, is typically faster than using 2D hardware (if it's even available, which it isn't on modern GPUs).