Mixing Graphics and Compute with multiple GPUs, Alina Alt
Mixing Graphics and Compute with multiple GPUs, Alina Alt
Mixing Graphics and Compute with multiple GPUs, Alina Alt
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Mixing</strong> <strong>Graphics</strong> <strong>and</strong><strong>Compute</strong> <strong>with</strong> <strong>multiple</strong><strong>GPUs</strong>, <strong>Alina</strong> <strong>Alt</strong>
Agenda<strong>Compute</strong> <strong>and</strong> <strong>Graphics</strong> API InteroperabilityInteroperability at a system levelApplication design considerationsNVIDIA Confidential
<strong>Compute</strong> <strong>and</strong> Visualize the same dataApplication<strong>Compute</strong><strong>Graphics</strong>NVIDIA Confidential
<strong>Compute</strong>/<strong>Graphics</strong> interoperabilitySet up the objects in the graphics contextRegister objects <strong>with</strong> the compute contextMap/Unmap the objects from the compute contextApplicationCUDA BufferBufferCUDA ArrayTextureCUDAOpenGL/DXNVIDIA Confidential
Simple OpenGL-CUDA interop sample: Setup<strong>and</strong> Register of Buffer ObjectsGLuint imagePBO;cuda<strong>Graphics</strong>Resource_t//OpenGL buffer creationglGenBuffers(1, &imagePBO);cudaResourceBuf;glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, imagePBO);glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, size, NULL,GL_DYNAMIC_DRAW);glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB,0);//Registration <strong>with</strong> CUDAcuda<strong>Graphics</strong>GLRegisterBuffer(&cudaResourceBuf, imagePBO,cuda<strong>Graphics</strong>RegisterFlagsNone);NVIDIA Confidential
Simple OpenGL-CUDA interop sample: Setup<strong>and</strong> Register of Texture ObjectsGLuint imageTex;cuda<strong>Graphics</strong>Resource_t//OpenGL texture creationglGenTextures(1, &imageTex);cudaResourceTex;glBindTexture(GL_TEXTURE_2D, imageTex);//set texture parameters hereglTexImage2D(GL_TEXTURE_2D,0, GL_RGBA8UI_EXT, width, height, 0,GL_RGBA_INTEGER_EXT, GL_UNSIGNED_BYTE, NULL);glBindTexture(GL_TEXTURE_2D, 0);//Registration <strong>with</strong> CUDAcuda<strong>Graphics</strong>GLRegisterImage (&cudaResourceTex, imageTex,GL_TEXTURE_2D, cuda<strong>Graphics</strong>RegisterFlagsNone);NVIDIA Confidential
Simple OpenGL-CUDA interop sampleunsigned char *memPtr;cudaArray *arrayPtr;while (!done) {cuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf, cudaStream);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size,cudaResourceBuf);doWorkInCUDA(cudaArray, memPtr, cudaStream);cuda<strong>Graphics</strong>UnmapResources(1, & cudaResourceBuf, cudaStream);doWorkInGL(imagePBO, imageTex);}NVIDIA Confidential
Simple OpenGL-CUDA interop sampleunsigned char *memPtr;cudaArray *arrayPtr;while (!done) {Context switchingcuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf, cudaStream);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size,cudaResourceBuf);doWorkInCUDA(cudaArray, memPtr, cudaStream);cuda<strong>Graphics</strong>UnmapResources(1, & cudaResourceBuf, cudaStream);doWorkInGL(imagePBO, imageTex);}NVIDIA Confidential
Interoperability behavior:single GPUThe resource is sharedContext switch is fast <strong>and</strong> independent ondata sizeCUDAContextAPI InteropGPUOpenGLContextDatacomputeABCUDAdrawABOpenGL/DXtimeNVIDIA Confidential
Interoperability behavior: <strong>multiple</strong> <strong>GPUs</strong>Each context owns a copy of the resourceContext switch can be slow <strong>and</strong> isdependent on data sizeGPUCUDAContextAPI InteropGPUOpenGLContextDataDataSynchronize/MapcomputeSynchronize/UnmapAAABBBCUDAdrawAOpenGL/DXtimeNVIDIA Confidentialvsync
Simple OpenGL-CUDA interop sampleIf CUDA is the producer….Use mapping hint cuda<strong>Graphics</strong>MapFlagsWriteDiscard <strong>with</strong>cuda<strong>Graphics</strong>ResourceSetMapFlags()unsigned char *memPtr;cuda<strong>Graphics</strong>ResourceSetMapFlags(cudaResourceBuf,while (!done) {}cuda<strong>Graphics</strong>MapFlagsWriteDiscard);cuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf, cudaStream);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size, cudaResourceBuf);doWorkInCUDA(memPtr, cudaStream);cuda<strong>Graphics</strong>UnmapResources(1, &cudaResourceBuf, cudaStream);doWorkInGL(imagePBO);NVIDIA Confidential
Interoperability behavior:single GPUTasks are still serialized….API InteropGPUCUDAContextOpenGLContextDatacomputeABCUDAdrawABOpenGL/DXtimeNVIDIA Confidential
Interoperability behavior: <strong>multiple</strong> <strong>GPUs</strong>Tasks are overlappedGPUAPI InteropGPUCUDAContextOpenGLContextDataDataSynchronize/MapcomputeSynchronize/UnmapAABBAACUDAdrawBABOpenGL/DXtimeNVIDIA Confidentialvsync
Simple OpenGL-CUDA interop sampleSimilar considerations are applicable when OpenGL is theproducer <strong>and</strong> CUDA is consumer:cuda<strong>Graphics</strong>MapFlagsReadOnlyNVIDIA Confidential
Application Example:Autodesk MayaCUDA plug-in <strong>with</strong>in an OpenGL applicationNVIDIA Confidential }SignalWait(setupCompleted);wglMakeCurrent(hDC,workerCtx);//Register OpenGL objects <strong>with</strong> CUDAint count = 1;while (!done) {SignalWait(oglDone[count]);glWaitSync(endGLSync[count]);cuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf[count], cudaStream[N]);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size, cudaResourceBuf[count]);doWorkInCUDA(memPtr, cudaStream[N]);cuda<strong>Graphics</strong>UnmapResources(1, &cudaResourceBuf[count], cudaStream[N]);cudaStreamSynchronize(cudaStreamN);EventSignal(cudaDone[count]);count = (count+1)%2;CUDA workerthread NmainCtx = wglCreateContext(hDC);workerCtx = wglCreateContextAttrib(hDC,mainCtx…);wglMakeCurrent(hDC, mainCtx);//Create OpenGL objectsEventSignal(setupCompleted);int count = 0;while (!done) {SignalWait(cudaDone[count]);doWorkInGL(imagePBO[count]);endGLSync[count] = glFenceSync(…);EventSignal(oglDone[count]);count = (count+1)%2;}Main OpenGLthread
API Interoperability hides all the complexityNVIDIA Confidential
Manual InetroperabilityA) B)GPU 1 GPU 2CUDAContextOpenGLContextCUDA memcpyglBufferSubDataBufferRAMNVIDIA Confidential
Application Design ConsiderationsContext switching performance varies <strong>with</strong> system configuration<strong>and</strong> OS.Provision for multi-GPU environments:No heuristics. Let the user chose the <strong>GPUs</strong>.Use cudaD3D[9|10|11]GetDevices/cudaGLGetDevices to match CUDA <strong>and</strong>OpenGL device enumerationsAvoid synchronized <strong>GPUs</strong> for CUDACUDA-OpenGL interoperability can perform slower if OpenGLcontext spans <strong>multiple</strong> GPUConsider manual interoperability for fine-grained controlNVIDIA Confidential
ResourcesCUDA samples/documentation:http://developer.nvidia.com/cuda-downloadsOpenGL Insights, Patric Cozzi, Christophe Riccio, 2012. ISBN1439893764. www.openglinsights.comNVIDIA Confidential