Mixing Graphics and Compute with multiple GPUs, Alina Alt

Mixing Graphics andCompute with multipleGPUs, Alina Alt

AgendaCompute and Graphics API InteroperabilityInteroperability at a system levelApplication design considerationsNVIDIA Confidential

Compute and Visualize the same dataApplicationComputeGraphicsNVIDIA Confidential

Compute/Graphics interoperabilitySet up the objects in the graphics contextRegister objects with the compute contextMap/Unmap the objects from the compute contextApplicationCUDA BufferBufferCUDA ArrayTextureCUDAOpenGL/DXNVIDIA Confidential

Simple OpenGL-CUDA interop sample: Setupand Register of Buffer ObjectsGLuint imagePBO;cudaGraphicsResource_t//OpenGL buffer creationglGenBuffers(1, &imagePBO);cudaResourceBuf;glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, imagePBO);glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, size, NULL,GL_DYNAMIC_DRAW);glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB,0);//Registration with CUDAcudaGraphicsGLRegisterBuffer(&cudaResourceBuf, imagePBO,cudaGraphicsRegisterFlagsNone);NVIDIA Confidential

Simple OpenGL-CUDA interop sample: Setupand Register of Texture ObjectsGLuint imageTex;cudaGraphicsResource_t//OpenGL texture creationglGenTextures(1, &imageTex);cudaResourceTex;glBindTexture(GL_TEXTURE_2D, imageTex);//set texture parameters hereglTexImage2D(GL_TEXTURE_2D,0, GL_RGBA8UI_EXT, width, height, 0,GL_RGBA_INTEGER_EXT, GL_UNSIGNED_BYTE, NULL);glBindTexture(GL_TEXTURE_2D, 0);//Registration with CUDAcudaGraphicsGLRegisterImage (&cudaResourceTex, imageTex,GL_TEXTURE_2D, cudaGraphicsRegisterFlagsNone);NVIDIA Confidential

Simple OpenGL-CUDA interop sampleunsigned char *memPtr;cudaArray *arrayPtr;while (!done) {cudaGraphicsMapResources(1, &cudaResourceBuf, cudaStream);cudaGraphicsResourceGetMappedPointer((void **)&memPtr, &size,cudaResourceBuf);doWorkInCUDA(cudaArray, memPtr, cudaStream);cudaGraphicsUnmapResources(1, & cudaResourceBuf, cudaStream);doWorkInGL(imagePBO, imageTex);}NVIDIA Confidential

Simple OpenGL-CUDA interop sampleunsigned char *memPtr;cudaArray *arrayPtr;while (!done) {Context switchingcudaGraphicsMapResources(1, &cudaResourceBuf, cudaStream);cudaGraphicsResourceGetMappedPointer((void **)&memPtr, &size,cudaResourceBuf);doWorkInCUDA(cudaArray, memPtr, cudaStream);cudaGraphicsUnmapResources(1, & cudaResourceBuf, cudaStream);doWorkInGL(imagePBO, imageTex);}NVIDIA Confidential

Interoperability behavior:single GPUThe resource is sharedContext switch is fast and independent ondata sizeCUDAContextAPI InteropGPUOpenGLContextDatacomputeABCUDAdrawABOpenGL/DXtimeNVIDIA Confidential

Interoperability behavior: multiple GPUsEach context owns a copy of the resourceContext switch can be slow and isdependent on data sizeGPUCUDAContextAPI InteropGPUOpenGLContextDataDataSynchronize/MapcomputeSynchronize/UnmapAAABBBCUDAdrawAOpenGL/DXtimeNVIDIA Confidentialvsync

Simple OpenGL-CUDA interop sampleIf CUDA is the producer….Use mapping hint cudaGraphicsMapFlagsWriteDiscard withcudaGraphicsResourceSetMapFlags()unsigned char *memPtr;cudaGraphicsResourceSetMapFlags(cudaResourceBuf,while (!done) {}cudaGraphicsMapFlagsWriteDiscard);cudaGraphicsMapResources(1, &cudaResourceBuf, cudaStream);cudaGraphicsResourceGetMappedPointer((void **)&memPtr, &size, cudaResourceBuf);doWorkInCUDA(memPtr, cudaStream);cudaGraphicsUnmapResources(1, &cudaResourceBuf, cudaStream);doWorkInGL(imagePBO);NVIDIA Confidential

Interoperability behavior:single GPUTasks are still serialized….API InteropGPUCUDAContextOpenGLContextDatacomputeABCUDAdrawABOpenGL/DXtimeNVIDIA Confidential

Interoperability behavior: multiple GPUsTasks are overlappedGPUAPI InteropGPUCUDAContextOpenGLContextDataDataSynchronize/MapcomputeSynchronize/UnmapAABBAACUDAdrawBABOpenGL/DXtimeNVIDIA Confidentialvsync

Simple OpenGL-CUDA interop sampleSimilar considerations are applicable when OpenGL is theproducer and CUDA is consumer:cudaGraphicsMapFlagsReadOnlyNVIDIA Confidential

Application Example:Autodesk MayaCUDA plug-in within an OpenGL applicationNVIDIA Confidential }SignalWait(setupCompleted);wglMakeCurrent(hDC,workerCtx);//Register OpenGL objects with CUDAint count = 1;while (!done) {SignalWait(oglDone[count]);glWaitSync(endGLSync[count]);cudaGraphicsMapResources(1, &cudaResourceBuf[count], cudaStream[N]);cudaGraphicsResourceGetMappedPointer((void **)&memPtr, &size, cudaResourceBuf[count]);doWorkInCUDA(memPtr, cudaStream[N]);cudaGraphicsUnmapResources(1, &cudaResourceBuf[count], cudaStream[N]);cudaStreamSynchronize(cudaStreamN);EventSignal(cudaDone[count]);count = (count+1)%2;CUDA workerthread NmainCtx = wglCreateContext(hDC);workerCtx = wglCreateContextAttrib(hDC,mainCtx…);wglMakeCurrent(hDC, mainCtx);//Create OpenGL objectsEventSignal(setupCompleted);int count = 0;while (!done) {SignalWait(cudaDone[count]);doWorkInGL(imagePBO[count]);endGLSync[count] = glFenceSync(…);EventSignal(oglDone[count]);count = (count+1)%2;}Main OpenGLthread

API Interoperability hides all the complexityNVIDIA Confidential

Manual InetroperabilityA) B)GPU 1 GPU 2CUDAContextOpenGLContextCUDA memcpyglBufferSubDataBufferRAMNVIDIA Confidential

Application Design ConsiderationsContext switching performance varies with system configurationand OS.Provision for multi-GPU environments:No heuristics. Let the user chose the GPUs.Use cudaD3D[9|10|11]GetDevices/cudaGLGetDevices to match CUDA andOpenGL device enumerationsAvoid synchronized GPUs for CUDACUDA-OpenGL interoperability can perform slower if OpenGLcontext spans multiple GPUConsider manual interoperability for fine-grained controlNVIDIA Confidential

ResourcesCUDA samples/documentation:http://developer.nvidia.com/cuda-downloadsOpenGL Insights, Patric Cozzi, Christophe Riccio, 2012. ISBN1439893764. www.openglinsights.comNVIDIA Confidential

Mixing Graphics and Compute with multiple GPUs, Alina Alt

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?