- Launching 2D Computational Grids
- Live Display via Graphics Interop
- Application: Stability
- Summary
- Suggested Projects
Live Display via Graphics Interop
Now that we can construct apps that produce image data, it makes sense to start displaying those images and exploring what CUDA’s massive parallelism enables us to do in real time.
Real-time graphic interactivity will involve CUDA’s provision for interoperability with a standard graphics package. We will be using OpenGL, which could be (and is) the subject of numerous books all by itself 2,4,5, so we will take our usual need-to-know approach. We introduce just enough OpenGL to display a single textured rectangle and provide a few examples of code to support interactions via keyboard and mouse with the help of the OpenGL Utility Toolkit (GLUT). The idea is that the rectangle provides a window into the world of your app, and you can use CUDA to compute the pixel shading values corresponding to whatever scene you want the user to see. CUDA/OpenGL interop provides interactive controls and displays the changing scene as a texture on the displayed rectangle in real time (or, more accurately, at a rate comparable to the ~60Hz refresh rate typical of modern visual display systems).
Here we present the code for a sample app that opens a graphics window and interactively displays an image based on distance to a reference point that can be changed interactively using keyboard or mouse input. We call the app flashlight because it produces a directable circle of light whose intensity diminishes away from the center of the “spot.” Figure 4.1 shows the screenshot of the app in its finished state.
Figure 4.1 Interactive spot of light in the finished application
This entire app requires a total of less than 200 lines of code, which we have organized into three files:
- main.cpp contains the essentials of the CUDA/OpenGL set up and interop. It is about 100 lines of code (half of the total), and while we will provide a brief explanation of its contents, you should be able to create your own apps by using flashlight as a template by making only minor changes to main.cpp.
- kernel.cu contains the essential CUDA code, including the clip() function described above, the definition of the kernelLauncher() function, and the definition of the actual kernel function (here distanceKernel()), which must write its output to a uchar4 array.
- interactions.h defines the callback functions keyboard(), mouseMove(), and mouseDrag() to specify how the system should respond to inputs.
While we will go through the entire code, the important point is that you can use the flashlight app as a template to readily create your own apps in just a few steps:
- Create a new app based on flashlight by making a copy of the code directory under Linux or by creating a new project using flashlight as a template in Visual Studio under Windows.
- Edit the kernel function to produce whatever data you want to display.
- In interactions.h, edit the callback functions to specify how your app should respond to keyboard and mouse inputs, and edit printInstructions() to customize the instructions for user interactions.
- Optionally, edit the #define TITLE_STRING statement in interactions.h to customize the app name in the title bar of the graphics window.
Listings 4.5, 4.6, 4.7, and 4.8 show all the code necessary to display a distance image on your screen using CUDA/OpenGL interop, and we will walk you through the necessities while trying not to get hung up on too many details.
Listing 4.5 flashlight/main.cpp
1 #include "kernel.h"
2 #include <stdio.h>
3 #include <stdlib.h>
4 #ifdef _WIN32
5 #define WINDOWS_LEAN_AND_MEAN
6 #define NOMINMAX
7 #include <windows.h>
8 #endif
9 #ifdef __APPLE__
10 #include <GLUT/glut.h>
11 #else
12 #include <GL/glew.h>
13 #include <GL/freeglut.h>
14 #endif
15 #include <cuda_runtime.h>
16 #include <cuda_gl_interop.h>
17 #include "interactions.h"
18
19 // texture and pixel objects
20 GLuint pbo = 0; // OpenGL pixel buffer object
21 GLuint tex = 0; // OpenGL texture object
22 struct cudaGraphicsResource *cuda_pbo_resource;
23
24 void render() {
25 uchar4 *d_out = 0;
26 cudaGraphicsMapResources(1, &cuda_pbo_resource, 0);
27 cudaGraphicsResourceGetMappedPointer((void **)&d_out, NULL,
28 cuda_pbo_resource);
29 kernelLauncher(d_out, W, H, loc);
30 cudaGraphicsUnmapResources(1, &cuda_pbo_resource, 0);
31 }
32
33 void drawTexture() {
34 glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, W, H, 0, GL_RGBA,
35 GL_UNSIGNED_BYTE, NULL);
36 glEnable(GL_TEXTURE_2D);
37 glBegin(GL_QUADS);
38 glTexCoord2f(0.0f, 0.0f); glVertex2f(0, 0);
39 glTexCoord2f(0.0f, 1.0f); glVertex2f(0, H);
40 glTexCoord2f(1.0f, 1.0f); glVertex2f(W, H);
41 glTexCoord2f(1.0f, 0.0f); glVertex2f(W, 0);
42 glEnd();
43 glDisable(GL_TEXTURE_2D);
44 }
45
46 void display() {
47 render();
48 drawTexture();
49 glutSwapBuffers();
50 }
51
52 void initGLUT(int *argc, char **argv) {
53 glutInit(argc, argv);
54 glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
55 glutInitWindowSize(W, H);
56 glutCreateWindow(TITLE_STRING);
57 #ifndef __APPLE__
58 glewInit();
59 #endif
60 }
61
62 void initPixelBuffer() {
63 glGenBuffers(1, &pbo);
64 glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo);
65 glBufferData(GL_PIXEL_UNPACK_BUFFER, 4*W*H*sizeof(GLubyte), 0,
66 GL_STREAM_DRAW);
67 glGenTextures(1, &tex);
68 glBindTexture(GL_TEXTURE_2D, tex);
69 glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
70 cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo,
71 cudaGraphicsMapFlagsWriteDiscard);
72 }
73
74 void exitfunc() {
75 if (pbo) {
76 cudaGraphicsUnregisterResource(cuda_pbo_resource);
77 glDeleteBuffers(1, &pbo);
78 glDeleteTextures(1, &tex);
79 }
80 }
81
82 int main(int argc, char** argv) {
83 printInstructions();
84 initGLUT(&argc, argv);
85 gluOrtho2D(0, W, H, 0);
86 glutKeyboardFunc(keyboard);
87 glutSpecialFunc(handleSpecialKeypress);
88 glutPassiveMotionFunc(mouseMove);
89 glutMotionFunc(mouseDrag);
90 glutDisplayFunc(display);
91 initPixelBuffer();
92 glutMainLoop();
93 atexit(exitfunc);
94 return 0;
95 }
This is the brief, high-level overview of what is happening in main.cpp. Lines 1–17 load the header files appropriate for your operating system to access the necessary supporting code. The rest of the explanation should start from the bottom. Lines 82–95 define main(), which does the following things:
- Line 83 prints a few user interface instructions to the command window.
- initGLUT initializes the GLUT library and sets up the specifications for the graphics window, including the display mode (RGBA), the buffering (double), size (W x H), and title.
- gluOrtho2D(0, W, H, 0) establishes the viewing transform (simple orthographic projection).
- Lines 86–89 indicate that keyboard and mouse interactions will be specified by the functions keyboard, handleSpecialKeypress, mouseMove, and mouseDrag (the details of which will be specified in interactions.h).
glutDisplayFunc(display) says that what is to be shown in the window is determined by the function display(), which is all of three lines long. On lines 47–49, it calls render() to compute new pixel values, drawTexture() to draw the OpenGL texture, and then swaps the display buffers.
- drawTexture() sets up a 2D OpenGL texture image, creates a single quadrangle graphics primitive with texture coordinates (0.0f, 0.0f), (0.0f, 1.0f), (1.0f, 1.0f), and (1.0f, 0.0f); that is, the corners of the unit square, corresponding with the pixel coordinates (0, 0), (0, H), (W, H), and (W, 0).
- Double buffering is a common technique for enhancing the efficiency of graphics programs. One buffer provides memory that can be read to “feed” the display, while at the same time, the other buffer provides memory into which the contents of the next frame can be written. Between frames in a graphics sequence, the buffers swap their read/write roles.
initPixelBuffer(), not surprisingly, initializes the pixel buffer on lines 62–72. The key for our purposes is the last line which “registers” the OpenGL buffer with CUDA. This operation has some overhead, but it enables low-overhead “mapping” that turns over control of the buffer memory to CUDA to write output and “unmapping” that returns control of the buffer memory to OpenGL for display. Figure 4.2 shows a summary of the interop between CUDA and OpenGL.
Figure 4.2 Illustration of alternating access to device memory that is mapped to CUDA to store computational results and unmapped (i.e., returned to OpenGL control) for display of those results
glutMainLoop(), on line 92, is where the real action happens. It repeatedly checks for input and calls for computation of updated images via display that calls render, which does the following:
- Maps the pixel buffer to CUDA and gets a CUDA pointer to the buffer memory so it can serve as the output device array
- Calls the wrapper function kernelLauncher that launches the kernel to compute the pixel values for the updated image
- Unmaps the buffer so OpenGL can display the contents
- When you exit the app, atexit(exitfunc) performs the final clean up by undoing the resource registration and deleting the OpenGL pixel buffer and texture before zero is returned to indicate completion of main().
Of all the code in main.cpp, the only thing you need to change when you create your own CUDA/OpenGL interop apps is the render() function, where you will need to update the argument list for kernelLauncher().
Listing 4.6 flashlight/kernel.cu
1 #include "kernel.h"
2 #define TX 32
3 #define TY 32
4
5 __device__
6 unsigned char clip(int n) { return n > 255 ? 255 : (n < 0 ? 0 : n); }
7
8 __global__
9 void distanceKernel(uchar4 *d_out, int w, int h, int2 pos) {
10 const int c = blockIdx.x*blockDim.x + threadIdx.x;
11 const int r = blockIdx.y*blockDim.y + threadIdx.y;
12 if ((c >= w) || (r >= h)) return; // Check if within image bounds
13 const int i = c + r*w; // 1D indexing
14 const int dist = sqrtf((c - pos.x)*(c - pos.x) +
15 (r - pos.y)*(r - pos.y));
16 const unsigned char intensity = clip(255 - dist);
17 d_out[i].x = intensity;
18 d_out[i].y = intensity;
19 d_out[i].z = 0;
20 d_out[i].w = 255;
21 }
22
23 void kernelLauncher(uchar4 *d_out, int w, int h, int2 pos) {
24 const dim3 blockSize(TX, TY);
25 const dim3 gridSize = dim3((w + TX - 1)/TX, (h + TY - 1)/TY);
26 distanceKernel<<<gridSize, blockSize>>>(d_out, w, h, pos);
27 }
The code from kernel.cu in Listing 4.6 should look familiar and require little explanation at this point. The primary change is a wrapper function kernelLauncher() that computes the grid dimensions and launches the kernel. Note that you will not find any mention of a host output array. Computation and display are both handled from the device, and there is no need to transfer data to the host. (Such a transfer of large quantities of image data across the PCIe bus could be time-consuming and greatly inhibit real-time interaction capabilities.) You will also not find a cudaMalloc() to create space for a device array. The render() function in main.cpp declares a pointer d_out that gets its value from cudaGraphicsResourceGetMappedPointer() and provides the CUDA pointer to the memory allocated for the pixel buffer.
The header file associated with the kernel is shown in Listing 4.7. In addition to the include guard and kernel function prototype, kernel.h also contains forward declarations for uchar4 and int2 so that the compiler knows of their existence before the CUDA code (which is aware of their definitions) is built or executed.
Listing 4.7 flashlight/kernel.h
1 #ifndef KERNEL_H
2 #define KERNEL_H
3
4 struct uchar4;
5 struct int2;
6
7 void kernelLauncher(uchar4 *d_out, int w, int h, int2 pos);
8
9 #endif
Listing 4.8 flashlight/interactions.h that specifies callback functions controlling interactive behavior of the flashlight app
1 #ifndef INTERACTIONS_H
2 #define INTERACTIONS_H
3 #define W 600
4 #define H 600
5 #define DELTA 5 // pixel increment for arrow keys
6 #define TITLE_STRING "flashlight: distance image display app"
7 int2 loc = {W/2, H/2};
8 bool dragMode = false; // mouse tracking mode
9
10 void keyboard(unsigned char key, int x, int y) {
11 if (key == 'a') dragMode = !dragMode; // toggle tracking mode
12 if (key == 27) exit(0);
13 glutPostRedisplay();
14 }
15
16 void mouseMove(int x, int y) {
17 if (dragMode) return;
18 loc.x = x;
19 loc.y = y;
20 glutPostRedisplay();
21 }
22
23 void mouseDrag(int x, int y) {
24 if (!dragMode) return;
25 loc.x = x;
26 loc.y = y;
27 glutPostRedisplay();
28 }
29
30 void handleSpecialKeypress(int key, int x, int y) {
31 if (key == GLUT_KEY_LEFT) loc.x -= DELTA;
32 if (key == GLUT_KEY_RIGHT) loc.x += DELTA;
33 if (key == GLUT_KEY_UP) loc.y -= DELTA;
34 if (key == GLUT_KEY_DOWN) loc.y += DELTA;
35 glutPostRedisplay();
36 }
37
38 void printInstructions() {
39 printf("flashlight interactions\n");
40 printf("a: toggle mouse tracking mode\n");
41 printf("arrow keys: move ref location\n");
42 printf("esc: close graphics window\n");
43 }
44
45 #endif
The stated goal of the flashlight app is to display an image corresponding to the distance to a reference point that can be moved interactively, and we are now ready to define and implement the interactions. The code for interactions.h shown in Listing 4.8 allows the user to move the reference point (i.e., the center of the flashlight beam) by moving the mouse or pressing the arrow keys. Pressing a toggles between tracking mouse motions and tracking mouse drags (with the mouse button pressed), and the esc key closes the graphics window. Here’s a quick description of what the code does and how those interactions work:
- Lines 3–6 set the image dimensions, the text displayed in the title bar, and how far (in pixels) the reference point moves when an arrow key is pressed.
- Line 7 sets the initial reference location at {W/2, H/2}, the center of the image.
- Line 8 declares a Boolean variable dragMode that is initialized to false. We use dragMode to toggle back and forth between tracking mouse motions and “click-drag” motions.
Lines 10–14 specify the defined interactions with the keyboard:
- Pressing the a key toggles dragMode to switch the mouse tracking mode.
- The ASCII code 27 corresponds to the Esc key. Pressing Esc closes the graphics window.
- glutPostRedisplay() is called at the end of each callback function telling to compute a new image for display (by calling display() in main.cpp) based on the interactive input.
- Lines 16–21 specify the response to a mouse movement. When dragMode is toggled, return ensures that no action is taken. Otherwise, the components of the reference location are set to be equal to the x and y coordinates of the mouse before computing and displaying an updated image (via glutPostRedisplay()).
- Lines 23–28 similarly specify the response to a “click-drag.” When dragMode is false, return ensures that no action is taken. Otherwise, the reference location is reset to the last location of the mouse while the mouse was clicked.
- Lines 30–36 specify the response to special keys with defined actions. (Note that standard keyboard interactions are handled based on ASCII key codes 6, so special keys like arrow keys and function keys that do not generate standard ASCII codes need to be handled separately.) The flashlight app is set up so that depressing the arrow keys moves the reference location DELTA pixels in the desired direction.
- The printInstructions() function on lines 38–43 consists of print statements that provide user interaction instructions via the console.
While all the code and explanation for the flashlight app took about nine pages, let’s pause to put things in perspective. While we presented numbered listings totaling about 200 lines, if we were less concerned about readability, the entire code could be written in many fewer lines, so there is not a lot of code to digest. Perhaps more importantly, over half of those lines reside in main.cpp, which you should not really need to change at all to create your own apps other than to alter the list of arguments for the kernelLauncher() function or to customize the information displayed in the title bar. If you start with the flashlight app as a template, you should be able to (and are heartily encouraged to) harness the power of CUDA to create your own apps with interactive graphics by replacing the kernel function with one of your own design and by revising the collection of user interactions implemented in interactions.h.
Finally, the Makefile for building the app in Linux is provided in Listing 4.9.
Listing 4.9 flashlight/Makefile
1 UNAME_S := $(shell uname)
2
3 ifeq ($(UNAME_S), Darwin)
4 LDFLAGS = -Xlinker -framework,OpenGL -Xlinker -framework,GLUT
5 else
6 LDFLAGS += -L/usr/local/cuda/samples/common/lib/linux/x86_64
7 LDFLAGS += -lglut -lGL -lGLU -lGLEW
8 endif
9
10 NVCC = /usr/local/cuda/bin/nvcc
11 NVCC_FLAGS = -g -G -Xcompiler "-Wall -Wno-deprecated-declarations"
12
13 all: main.exe
14
15 main.exe: main.o kernel.o
16 $(NVCC) $^ -o $@ $(LDFLAGS)
17
18 main.o: main.cpp kernel.h interactions.h
19 $(NVCC) $(NVCC_FLAGS) -c $< -o $@
20
21 kernel.o: kernel.cu kernel.h
22 $(NVCC) $(NVCC_FLAGS) -c $< -o $@
Windows users will need to change one build customization and include two pairs of library files: the OpenGL Utility Toolkit (GLUT) and the OpenGL Extension Wrangler (GLEW). To keep things simple and ensure consistency of the library version, we find it convenient to simply make copies of the library files (which can be found by searching within the CUDA Samples directory for the filenames freeglut.dll, freeglut.lib, glew64.dll, and glew64.lib), save them to the project directory, and then add them to the project with PROJECT ⇒ Add Existing Item.
The build customization is specified using the Project Properties pages: Right-click on flashlight in the Solution Explorer pane, then select Properties ⇒ Configuration Properties ⇒ C/C++ ⇒ General ⇒ Additional Include Directories and edit the list to include the CUDA Samples’ common\inc directory. Its default install location is C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\common\inc.