8.5. Textures and Surfaces
The instructions that read and write textures and surfaces refer to much more implicit state than do other instructions; parameters such as the base address, dimensions, format, and interpretation of the texture contents are contained in a header, an intermediate data structure whose software abstraction is called a texture reference or surface reference. As developers manipulate the texture or surface references, the CUDA runtime and driver must translate those changes into the headers, which the texture or surface instruction references as an index.16
Before launching a kernel that operates on textures or surfaces, the driver must ensure that all this state is set correctly on the hardware. As a result, launching such kernels may take longer. Texture reads are serviced through a specialized cache subsystem that is separate from the L1/L2 caches in Fermi, and also separate from the constant cache. Each SM has an L1 texture cache, and the TPCs (texture processor clusters) or GPCs (graphics processor clusters) each additionally have L2 texture cache. Surface reads and writes are serviced through the same L1/L2 caches that service global memory traffic.
Kepler added two technologies of note with respect to textures: the ability to read from global memory via the texture cache hierarchy without binding a texture reference, and the ability to specify a texture header by address rather than by index. The latter technology is known as “bindless textures.”
On SM 3.5 and later hardware, reading global memory via the texture cache can be requested by using const __restrict pointers or by explicitly invoking the ldg() intrinsics in sm_35_intrinsics.h.