11.07.2015 Views

CUPTI User's Guide

CUPTI User's Guide

CUPTI User's Guide

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_CREATED. Using the stream queues is optional, but maybe useful to reduce or eliminate application perturbations caused by the need to process orsave the activity records returned in the buffers. For example, if a stream queue is used,that queue can be flushed when the stream is synchronized.Each activity buffer must be allocated by the <strong>CUPTI</strong> client, and passed to <strong>CUPTI</strong> usingthe cuptiActivityEnqueueBuffer function. Enqueuing a buffer passes ownership to<strong>CUPTI</strong>, and so the client should not read or write the contents of a buffer once it isenqueued. Ownership of a buffer is regained by using the cuptiActivityDequeueBufferfunction.As the application executes, the activity buffers will fill. It is the <strong>CUPTI</strong> client’sresponsibility to ensure that a sufficient number of appropriately sized buffers areenqueued to avoid dropped activity records. Activity buffers can be enqueued anddequeued at the following points. Enqueuing and dequeuing activity buffers at any otherpoint may result in corrupt activity records.Before CUDA initialization: Buffers can be enqueued and dequeued to/from the globalqueue before CUDA driver or runtime API is called.In synchronization or resource callbacks: At context creation, destruction, orsynchronization, buffers may be enqueued or dequeued to/from the correspondingcontext queue, and from any stream queues associated with streams in that context.At stream creation, destruction, or synchronization, buffers may be enqueued ordequeued to/from the corresponding stream queue. The global queue may also beenqueued or dequeued at this time.After device synchronization: After a CUDA device is synchronized or reset (withcudaDeviceSynchronize or cudaDeviceReset), and before any subsequent CUDAdriver or runtime API is invoked, buffers can enqueued and dequeued to/from anyactivity queue.The activity_trace sample described on page 25 shows how to use global, context, andstream queues to collect a trace of CPU and GPU activity for a simple application.<strong>CUPTI</strong> Callback APIThe <strong>CUPTI</strong> Callback API allows you to register a callback into your own code. Yourcallback will be invoked when the application being profiled calls a CUDA runtime ordriver function, or when certain events occur in the CUDA driver. The followingterminology is used by the callback API.Callback Domain: Callbacks are grouped into domains to make it easier to associate yourcallback functions with groups of related CUDA functions or events. There arecurrently four callback domains, as defined by CUpti_CallbackDomain: a domain forCUDA runtime functions, a domain for CUDA driver functions, a domain for CUDACUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!