12.07.2015 Views

Mixing Graphics and Compute with multiple GPUs, Alina Alt

Mixing Graphics and Compute with multiple GPUs, Alina Alt

Mixing Graphics and Compute with multiple GPUs, Alina Alt

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Mixing</strong> <strong>Graphics</strong> <strong>and</strong><strong>Compute</strong> <strong>with</strong> <strong>multiple</strong><strong>GPUs</strong>, <strong>Alina</strong> <strong>Alt</strong>


Agenda<strong>Compute</strong> <strong>and</strong> <strong>Graphics</strong> API InteroperabilityInteroperability at a system levelApplication design considerationsNVIDIA Confidential


<strong>Compute</strong> <strong>and</strong> Visualize the same dataApplication<strong>Compute</strong><strong>Graphics</strong>NVIDIA Confidential


<strong>Compute</strong>/<strong>Graphics</strong> interoperabilitySet up the objects in the graphics contextRegister objects <strong>with</strong> the compute contextMap/Unmap the objects from the compute contextApplicationCUDA BufferBufferCUDA ArrayTextureCUDAOpenGL/DXNVIDIA Confidential


Simple OpenGL-CUDA interop sample: Setup<strong>and</strong> Register of Buffer ObjectsGLuint imagePBO;cuda<strong>Graphics</strong>Resource_t//OpenGL buffer creationglGenBuffers(1, &imagePBO);cudaResourceBuf;glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, imagePBO);glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, size, NULL,GL_DYNAMIC_DRAW);glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB,0);//Registration <strong>with</strong> CUDAcuda<strong>Graphics</strong>GLRegisterBuffer(&cudaResourceBuf, imagePBO,cuda<strong>Graphics</strong>RegisterFlagsNone);NVIDIA Confidential


Simple OpenGL-CUDA interop sample: Setup<strong>and</strong> Register of Texture ObjectsGLuint imageTex;cuda<strong>Graphics</strong>Resource_t//OpenGL texture creationglGenTextures(1, &imageTex);cudaResourceTex;glBindTexture(GL_TEXTURE_2D, imageTex);//set texture parameters hereglTexImage2D(GL_TEXTURE_2D,0, GL_RGBA8UI_EXT, width, height, 0,GL_RGBA_INTEGER_EXT, GL_UNSIGNED_BYTE, NULL);glBindTexture(GL_TEXTURE_2D, 0);//Registration <strong>with</strong> CUDAcuda<strong>Graphics</strong>GLRegisterImage (&cudaResourceTex, imageTex,GL_TEXTURE_2D, cuda<strong>Graphics</strong>RegisterFlagsNone);NVIDIA Confidential


Simple OpenGL-CUDA interop sampleunsigned char *memPtr;cudaArray *arrayPtr;while (!done) {cuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf, cudaStream);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size,cudaResourceBuf);doWorkInCUDA(cudaArray, memPtr, cudaStream);cuda<strong>Graphics</strong>UnmapResources(1, & cudaResourceBuf, cudaStream);doWorkInGL(imagePBO, imageTex);}NVIDIA Confidential


Simple OpenGL-CUDA interop sampleunsigned char *memPtr;cudaArray *arrayPtr;while (!done) {Context switchingcuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf, cudaStream);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size,cudaResourceBuf);doWorkInCUDA(cudaArray, memPtr, cudaStream);cuda<strong>Graphics</strong>UnmapResources(1, & cudaResourceBuf, cudaStream);doWorkInGL(imagePBO, imageTex);}NVIDIA Confidential


Interoperability behavior:single GPUThe resource is sharedContext switch is fast <strong>and</strong> independent ondata sizeCUDAContextAPI InteropGPUOpenGLContextDatacomputeABCUDAdrawABOpenGL/DXtimeNVIDIA Confidential


Interoperability behavior: <strong>multiple</strong> <strong>GPUs</strong>Each context owns a copy of the resourceContext switch can be slow <strong>and</strong> isdependent on data sizeGPUCUDAContextAPI InteropGPUOpenGLContextDataDataSynchronize/MapcomputeSynchronize/UnmapAAABBBCUDAdrawAOpenGL/DXtimeNVIDIA Confidentialvsync


Simple OpenGL-CUDA interop sampleIf CUDA is the producer….Use mapping hint cuda<strong>Graphics</strong>MapFlagsWriteDiscard <strong>with</strong>cuda<strong>Graphics</strong>ResourceSetMapFlags()unsigned char *memPtr;cuda<strong>Graphics</strong>ResourceSetMapFlags(cudaResourceBuf,while (!done) {}cuda<strong>Graphics</strong>MapFlagsWriteDiscard);cuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf, cudaStream);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size, cudaResourceBuf);doWorkInCUDA(memPtr, cudaStream);cuda<strong>Graphics</strong>UnmapResources(1, &cudaResourceBuf, cudaStream);doWorkInGL(imagePBO);NVIDIA Confidential


Interoperability behavior:single GPUTasks are still serialized….API InteropGPUCUDAContextOpenGLContextDatacomputeABCUDAdrawABOpenGL/DXtimeNVIDIA Confidential


Interoperability behavior: <strong>multiple</strong> <strong>GPUs</strong>Tasks are overlappedGPUAPI InteropGPUCUDAContextOpenGLContextDataDataSynchronize/MapcomputeSynchronize/UnmapAABBAACUDAdrawBABOpenGL/DXtimeNVIDIA Confidentialvsync


Simple OpenGL-CUDA interop sampleSimilar considerations are applicable when OpenGL is theproducer <strong>and</strong> CUDA is consumer:cuda<strong>Graphics</strong>MapFlagsReadOnlyNVIDIA Confidential


Application Example:Autodesk MayaCUDA plug-in <strong>with</strong>in an OpenGL applicationNVIDIA Confidential }SignalWait(setupCompleted);wglMakeCurrent(hDC,workerCtx);//Register OpenGL objects <strong>with</strong> CUDAint count = 1;while (!done) {SignalWait(oglDone[count]);glWaitSync(endGLSync[count]);cuda<strong>Graphics</strong>MapResources(1, &cudaResourceBuf[count], cudaStream[N]);cuda<strong>Graphics</strong>ResourceGetMappedPointer((void **)&memPtr, &size, cudaResourceBuf[count]);doWorkInCUDA(memPtr, cudaStream[N]);cuda<strong>Graphics</strong>UnmapResources(1, &cudaResourceBuf[count], cudaStream[N]);cudaStreamSynchronize(cudaStreamN);EventSignal(cudaDone[count]);count = (count+1)%2;CUDA workerthread NmainCtx = wglCreateContext(hDC);workerCtx = wglCreateContextAttrib(hDC,mainCtx…);wglMakeCurrent(hDC, mainCtx);//Create OpenGL objectsEventSignal(setupCompleted);int count = 0;while (!done) {SignalWait(cudaDone[count]);doWorkInGL(imagePBO[count]);endGLSync[count] = glFenceSync(…);EventSignal(oglDone[count]);count = (count+1)%2;}Main OpenGLthread


API Interoperability hides all the complexityNVIDIA Confidential


Manual InetroperabilityA) B)GPU 1 GPU 2CUDAContextOpenGLContextCUDA memcpyglBufferSubDataBufferRAMNVIDIA Confidential


Application Design ConsiderationsContext switching performance varies <strong>with</strong> system configuration<strong>and</strong> OS.Provision for multi-GPU environments:No heuristics. Let the user chose the <strong>GPUs</strong>.Use cudaD3D[9|10|11]GetDevices/cudaGLGetDevices to match CUDA <strong>and</strong>OpenGL device enumerationsAvoid synchronized <strong>GPUs</strong> for CUDACUDA-OpenGL interoperability can perform slower if OpenGLcontext spans <strong>multiple</strong> GPUConsider manual interoperability for fine-grained controlNVIDIA Confidential


ResourcesCUDA samples/documentation:http://developer.nvidia.com/cuda-downloadsOpenGL Insights, Patric Cozzi, Christophe Riccio, 2012. ISBN1439893764. www.openglinsights.comNVIDIA Confidential

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!