11.07.2015 Views

CUPTI User's Guide

CUPTI User's Guide

CUPTI User's Guide

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>CUPTI</strong> Activity APIThe <strong>CUPTI</strong> Activity API allows you to asychronously collect a trace of an application’sCPU and GPU CUDA activity. The following terminology is used by the activity API.Activity Record: CPU and GPU activity is reported in C data structures called activityrecords. There is a different C structure type for each activity kind (e.g.CUpti_ActivityMemcpy). Records are generically referred to using theCUpti_Activity type. This type contains only a kind field that indicates the kind ofthe activity record. Using this kind, the object can be cast from the genericCUpti_Activity type to the specific type representing the activity. See theprintActivity function in the activity_trace sample for an example.Activity Buffer: <strong>CUPTI</strong> fills activity buffers with activity records as the correspondingactivities occur on the CPU and GPU. The <strong>CUPTI</strong> client is responsible for providingactivity buffers as necessary to ensure that no records are dropped.Activity Queue: <strong>CUPTI</strong> maintains queues of activity buffers. There are three types ofqueues: global, context, and stream.Global Queue: The global queue collects all activity records that are not associatedwith a valid context. All device, context, and API activity records are collected inthe global queue. A buffer is enqueued in the global queue by specifying NULL forthe context argument.Context Queue: Each context queue collects activity records associated with thatcontext that are not associated with a specific stream or that are associated with thedefault stream. A buffer is enqueued in a context queue by specifying 0 for thestreamId argument and a valid context for the context argument.Stream Queue: Each stream queue collects memcpy, memset, and kernel activityrecords associated with the stream. A buffer is enqueued in a stream queue byspecifying a non-zero value for the streamId argument and a valid context for thecontext argument. A streamId can be obtained from a CUstream object by usingthe cuptiGetStreamId function.<strong>CUPTI</strong> must be initialized in a specific manner to ensure that activity records arecollected correctly. Most importantly, <strong>CUPTI</strong> must be initialized before any CUDA driveror runtime API is invoked. Initialization can be done by enqueuing one or more buffers inthe global queue, as shown in the initTrace function of the activity_trace sample.Also, to ensure that device activity records are collected, you must enable device recordsbefore CUDA is initialized (also shown in the initTrace function).The other important requirement for correct activity API operation is the need to enqueueat least one buffer in the context queue of each context as it is created. Thus, as shown inthe activity_trace example, the <strong>CUPTI</strong> client should use the resource callback toenqueue at least one buffer when context creation is indicated byCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!