CUPTI User's Guide

More documents

Recommendations

Info

CUPTI Activity APIThe CUPTI Activity API allows you to asychronously collect a trace of an application’sCPU and GPU CUDA activity. The following terminology is used by the activity API.Activity Record: CPU and GPU activity is reported in C data structures called activityrecords. There is a different C structure type for each activity kind (e.g.CUpti_ActivityMemcpy). Records are generically referred to using theCUpti_Activity type. This type contains only a kind field that indicates the kind ofthe activity record. Using this kind, the object can be cast from the genericCUpti_Activity type to the specific type representing the activity. See theprintActivity function in the activity_trace sample for an example.Activity Buffer: CUPTI fills activity buffers with activity records as the correspondingactivities occur on the CPU and GPU. The CUPTI client is responsible for providingactivity buffers as necessary to ensure that no records are dropped.Activity Queue: CUPTI maintains queues of activity buffers. There are three types ofqueues: global, context, and stream.Global Queue: The global queue collects all activity records that are not associatedwith a valid context. All device, context, and API activity records are collected inthe global queue. A buffer is enqueued in the global queue by specifying NULL forthe context argument.Context Queue: Each context queue collects activity records associated with thatcontext that are not associated with a specific stream or that are associated with thedefault stream. A buffer is enqueued in a context queue by specifying 0 for thestreamId argument and a valid context for the context argument.Stream Queue: Each stream queue collects memcpy, memset, and kernel activityrecords associated with the stream. A buffer is enqueued in a stream queue byspecifying a non-zero value for the streamId argument and a valid context for thecontext argument. A streamId can be obtained from a CUstream object by usingthe cuptiGetStreamId function.CUPTI must be initialized in a specific manner to ensure that activity records arecollected correctly. Most importantly, CUPTI must be initialized before any CUDA driveror runtime API is invoked. Initialization can be done by enqueuing one or more buffers inthe global queue, as shown in the initTrace function of the activity_trace sample.Also, to ensure that device activity records are collected, you must enable device recordsbefore CUDA is initialized (also shown in the initTrace function).The other important requirement for correct activity API operation is the need to enqueueat least one buffer in the context queue of each context as it is created. Thus, as shown inthe activity_trace example, the CUPTI client should use the resource callback toenqueue at least one buffer when context creation is indicated byCUDA Tools SDK CUPTI User’s Guide DA-05679-001_v01 | 4
CUPTI_CBID_RESOURCE_CONTEXT_CREATED. Using the stream queues is optional, but maybe useful to reduce or eliminate application perturbations caused by the need to process orsave the activity records returned in the buffers. For example, if a stream queue is used,that queue can be flushed when the stream is synchronized.Each activity buffer must be allocated by the CUPTI client, and passed to CUPTI usingthe cuptiActivityEnqueueBuffer function. Enqueuing a buffer passes ownership toCUPTI, and so the client should not read or write the contents of a buffer once it isenqueued. Ownership of a buffer is regained by using the cuptiActivityDequeueBufferfunction.As the application executes, the activity buffers will fill. It is the CUPTI client’sresponsibility to ensure that a sufficient number of appropriately sized buffers areenqueued to avoid dropped activity records. Activity buffers can be enqueued anddequeued at the following points. Enqueuing and dequeuing activity buffers at any otherpoint may result in corrupt activity records.Before CUDA initialization: Buffers can be enqueued and dequeued to/from the globalqueue before CUDA driver or runtime API is called.In synchronization or resource callbacks: At context creation, destruction, orsynchronization, buffers may be enqueued or dequeued to/from the correspondingcontext queue, and from any stream queues associated with streams in that context.At stream creation, destruction, or synchronization, buffers may be enqueued ordequeued to/from the corresponding stream queue. The global queue may also beenqueued or dequeued at this time.After device synchronization: After a CUDA device is synchronized or reset (withcudaDeviceSynchronize or cudaDeviceReset), and before any subsequent CUDAdriver or runtime API is invoked, buffers can enqueued and dequeued to/from anyactivity queue.The activity_trace sample described on page 25 shows how to use global, context, andstream queues to collect a trace of CPU and GPU activity for a simple application.CUPTI Callback APIThe CUPTI Callback API allows you to register a callback into your own code. Yourcallback will be invoked when the application being profiled calls a CUDA runtime ordriver function, or when certain events occur in the CUDA driver. The followingterminology is used by the callback API.Callback Domain: Callbacks are grouped into domains to make it easier to associate yourcallback functions with groups of related CUDA functions or events. There arecurrently four callback domains, as defined by CUpti_CallbackDomain: a domain forCUDA runtime functions, a domain for CUDA driver functions, a domain for CUDACUDA Tools SDK CUPTI User’s Guide DA-05679-001_v01 | 5
Page 1 and 2: CUDA Tools SDKCUPTI User’s GuideD
Page 3: CUPTIThe CUDA Profiling Tools Inter
Page 7 and 8: twice each time any of the CUDA run
Page 9 and 10: device-specific limits. At any give
Page 11 and 12: Sampling EventsThe event API can al
Page 13: CapabilityEvent Name Description Ty
Page 16 and 17: CapabilityEvent Name Description Ty
Page 18 and 19: CapabilityEvent Name Description Ty
Page 20: CapabilityEvent Name Description Ty
Page 26 and 27: CUPTI ReferenceCUPTI VersionDefines
Page 28 and 29: CUPTI Result CodesEnumerations◮ e
Page 30 and 31: CUPTI_ERROR_INVALID_METRIC_IDCUPTI_
Page 32 and 33: CUPTI_ACTIVITY_COMPUTE_API_OPENCL =
Page 34 and 35: size_t ∗validBufferSizeBytes)Quer
Page 36 and 37: CUPTI_ACTIVITY_MEMCPY_KIND_HTOAcopy
Page 38 and 39: CUptiResult cuptiActivityEnqueueBuf
Page 40 and 41: CUptiResult cuptiActivityGetNumDrop
Page 42 and 43: CUpti_Activity Type ReferenceThe ba
Page 44 and 45: CUpti_ActivityKind CUpti_ActivityAP
Page 46 and 47: Detailed DescriptionThis activity r
Page 48 and 49: uint32_t CUpti_ActivityDevice::maxI
Page 50 and 51: uint64_t CUpti_ActivityEvent::value
Page 52 and 53: Field Documentationint32_t CUpti_Ac
Page 54 and 55:
uint16_t CUpti_ActivityKernel::regi
Page 56 and 57:
uint8_t CUpti_ActivityMemcpy::copyK
Page 58 and 59:
CUpti_ActivityMemset Type Reference
Page 60 and 61:
CUpti_ActivityMetric Type Reference
Page 62 and 63:
CUPTI Callback APIData Structures
Page 64 and 65:
Enable or disabled all callbacks fo
Page 66 and 67:
Enumerator:CUPTI_CBID_RESOURCE_INVA
Page 68 and 69:
Parameters:enable New enable state
Page 70 and 71:
CUptiResult cuptiUnsubscribe (CUpti
Page 72 and 73:
context.uint32_t CUpti_CallbackData
Page 74 and 75:
CUpti_ResourceData Type ReferenceDa
Page 76 and 77:
CUPTI Event APIData Structures◮ s
Page 78 and 79:
CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS =
Page 80 and 81:
∗eventValueBuffer, size_t ∗even
Page 82 and 83:
CUPTI_DEVICE_ATTR_MAX_EVENT_DOMAIN_
Page 84 and 85:
CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_
Page 86 and 87:
valueSize The size of the value buf
Page 88 and 89:
Return values:CUPTI_SUCCESSCUPTI_ER
Page 90 and 91:
Parameters:device The CUDA deviceev
Page 92 and 93:
Parameters:eventGroup The event gro
Page 94 and 95:
flags Flags controlling the reading
Page 96 and 97:
Return values:CUPTI_SUCCESSCUPTI_ER
Page 98 and 99:
CUPTI_ERROR_INVALID_OPERATION if an
Page 100 and 101:
CUPTI_METRIC_VALUE_KIND_PERCENT = 2
Page 102 and 103:
enum CUpti_MetricCategoryEach metri
Page 104 and 105:
metricArray Returns the IDs of the
Page 106 and 107:
CUPTI_ERROR_NOT_INITIALIZEDCUPTI_ER
Page 108 and 109:
CUPTI_ERROR_INVALID_PARAMETER if me
show all

CUPTI User's Guide

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?