11.07.2015 Views

CUPTI User's Guide

CUPTI User's Guide

CUPTI User's Guide

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CUDA Tools SDK<strong>CUPTI</strong> User’s <strong>Guide</strong>DA-05679-001_v01 | October 2011


Document Change HistoryVer Date Resp Reason for changev01 2011/1/19 DG Initial revision for CUDA Tools SDK 4.0v02 2011/8/15 DG Revisions for CUDA Tools SDK 4.1CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong>DA-05679-001_v01 | ii


<strong>CUPTI</strong>The CUDA Profiling Tools Interface (<strong>CUPTI</strong>) enables the creation of profiling and tracingtools that target CUDA applications. <strong>CUPTI</strong> provides four APIs, the Activity API, theCallback API, the Event API, and the Metric API. Using these APIs, you can developprofiling tools that give insight into the CPU and GPU behavior of CUDA applications.<strong>CUPTI</strong> is delivered as a dynamic library on all platforms supported by CUDA.<strong>CUPTI</strong> Compatibility and RequirementsNew versions of the CUDA driver are backwards compatible with older versions of <strong>CUPTI</strong>.For example, a developer using a profiling tool based on <strong>CUPTI</strong> 4.0 can update theirCUDA driver to 4.1. However, new versions of <strong>CUPTI</strong> are not backwards compatible witholder versions of the CUDA driver. For example, a developer using a profiling tool basedon <strong>CUPTI</strong> 4.1 must have the 4.1 version of the CUDA driver (or later) installed as well.<strong>CUPTI</strong> calls will fail with <strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if the CUDA driver version isnot compatible with the <strong>CUPTI</strong> version.<strong>CUPTI</strong> Initialization<strong>CUPTI</strong> initialization occurs lazily the first time you invoke any <strong>CUPTI</strong> function. The theEvent, Metric, and Callback APIs there are no requirements on when this initializationmust occur (i.e. you can invoke the first <strong>CUPTI</strong> function at any point). For correctoperation, the Activity API does require that <strong>CUPTI</strong> be initialized before any CUDAdriver or runtime API is invoked. See the <strong>CUPTI</strong> Activity API section for moreinformation on <strong>CUPTI</strong> initialization requirements for the activity API.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 3


<strong>CUPTI</strong> Activity APIThe <strong>CUPTI</strong> Activity API allows you to asychronously collect a trace of an application’sCPU and GPU CUDA activity. The following terminology is used by the activity API.Activity Record: CPU and GPU activity is reported in C data structures called activityrecords. There is a different C structure type for each activity kind (e.g.CUpti_ActivityMemcpy). Records are generically referred to using theCUpti_Activity type. This type contains only a kind field that indicates the kind ofthe activity record. Using this kind, the object can be cast from the genericCUpti_Activity type to the specific type representing the activity. See theprintActivity function in the activity_trace sample for an example.Activity Buffer: <strong>CUPTI</strong> fills activity buffers with activity records as the correspondingactivities occur on the CPU and GPU. The <strong>CUPTI</strong> client is responsible for providingactivity buffers as necessary to ensure that no records are dropped.Activity Queue: <strong>CUPTI</strong> maintains queues of activity buffers. There are three types ofqueues: global, context, and stream.Global Queue: The global queue collects all activity records that are not associatedwith a valid context. All device, context, and API activity records are collected inthe global queue. A buffer is enqueued in the global queue by specifying NULL forthe context argument.Context Queue: Each context queue collects activity records associated with thatcontext that are not associated with a specific stream or that are associated with thedefault stream. A buffer is enqueued in a context queue by specifying 0 for thestreamId argument and a valid context for the context argument.Stream Queue: Each stream queue collects memcpy, memset, and kernel activityrecords associated with the stream. A buffer is enqueued in a stream queue byspecifying a non-zero value for the streamId argument and a valid context for thecontext argument. A streamId can be obtained from a CUstream object by usingthe cuptiGetStreamId function.<strong>CUPTI</strong> must be initialized in a specific manner to ensure that activity records arecollected correctly. Most importantly, <strong>CUPTI</strong> must be initialized before any CUDA driveror runtime API is invoked. Initialization can be done by enqueuing one or more buffers inthe global queue, as shown in the initTrace function of the activity_trace sample.Also, to ensure that device activity records are collected, you must enable device recordsbefore CUDA is initialized (also shown in the initTrace function).The other important requirement for correct activity API operation is the need to enqueueat least one buffer in the context queue of each context as it is created. Thus, as shown inthe activity_trace example, the <strong>CUPTI</strong> client should use the resource callback toenqueue at least one buffer when context creation is indicated byCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 4


<strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_CREATED. Using the stream queues is optional, but maybe useful to reduce or eliminate application perturbations caused by the need to process orsave the activity records returned in the buffers. For example, if a stream queue is used,that queue can be flushed when the stream is synchronized.Each activity buffer must be allocated by the <strong>CUPTI</strong> client, and passed to <strong>CUPTI</strong> usingthe cuptiActivityEnqueueBuffer function. Enqueuing a buffer passes ownership to<strong>CUPTI</strong>, and so the client should not read or write the contents of a buffer once it isenqueued. Ownership of a buffer is regained by using the cuptiActivityDequeueBufferfunction.As the application executes, the activity buffers will fill. It is the <strong>CUPTI</strong> client’sresponsibility to ensure that a sufficient number of appropriately sized buffers areenqueued to avoid dropped activity records. Activity buffers can be enqueued anddequeued at the following points. Enqueuing and dequeuing activity buffers at any otherpoint may result in corrupt activity records.Before CUDA initialization: Buffers can be enqueued and dequeued to/from the globalqueue before CUDA driver or runtime API is called.In synchronization or resource callbacks: At context creation, destruction, orsynchronization, buffers may be enqueued or dequeued to/from the correspondingcontext queue, and from any stream queues associated with streams in that context.At stream creation, destruction, or synchronization, buffers may be enqueued ordequeued to/from the corresponding stream queue. The global queue may also beenqueued or dequeued at this time.After device synchronization: After a CUDA device is synchronized or reset (withcudaDeviceSynchronize or cudaDeviceReset), and before any subsequent CUDAdriver or runtime API is invoked, buffers can enqueued and dequeued to/from anyactivity queue.The activity_trace sample described on page 25 shows how to use global, context, andstream queues to collect a trace of CPU and GPU activity for a simple application.<strong>CUPTI</strong> Callback APIThe <strong>CUPTI</strong> Callback API allows you to register a callback into your own code. Yourcallback will be invoked when the application being profiled calls a CUDA runtime ordriver function, or when certain events occur in the CUDA driver. The followingterminology is used by the callback API.Callback Domain: Callbacks are grouped into domains to make it easier to associate yourcallback functions with groups of related CUDA functions or events. There arecurrently four callback domains, as defined by CUpti_CallbackDomain: a domain forCUDA runtime functions, a domain for CUDA driver functions, a domain for CUDACUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 5


esource tracking, and a domain for CUDA synchronization notification.Callback ID: Each callback is given a unique ID within the corresponding callback domainso that you can identify it within your callback function. The CUDA driver API IDsare defined in cupti_driver_cbid.h and the CUDA runtime API IDs are defined incupti_runtime_cbid.h. Both of these headers are included for you when youinclude cupti.h. The CUDA resource callback IDs are defined byCUpti_CallbackIdResource and the CUDA synchronization callback IDs are definedby CUpti_CallbackIdSync.Callback Function: Your callback function must be of type CUpti_CallbackFunc. Thisfunction type has two arguments that specify the callback domain and ID so thatyou know why the callback is occurring. The type also has a cbdata argument thatis used to pass data specific to the callback.Subscriber: A subscriber is used to associate each of your callback functions with one ormore CUDA API functions. There can be at most one subscriber initialized withcuptiSubscribe() at any time. Before initializing a new subscriber, the existingsubscriber must be finalized with cuptiUnsubscribe().Each callback domain is described in detail below.Driver and Runtime API CallbacksUsing the callback API with the <strong>CUPTI</strong>_CB_DOMAIN_DRIVER_API or<strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API domains, you can associate a callback function with oneor more CUDA API functions. When those CUDA functions are invoked in theapplication, your callback function is invoked as well. For these domains, the cbdataargument to your callback function will be of the type CUpti_CallbackData.The following code shows a typical sequence used to associate a callback function with oneor more CUDA API functions. To simplify the presentation error checking code has beenremoved.CUpti_SubscriberHandle subscriber ;MyDataStruct * my_data = ...;...cuptiSubscribe (& subscriber ,( CUpti_CallbackFunc ) my_callback , my_data );cuptiEnableDomain (1, subscriber ,<strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API );First, cuptiSubscribe is used to initialize a subscriber with the my_callback callbackfunction. Next, cuptiEnableDomain is used to associate that callback with all the CUDAruntime API functions. Using this code sequence will cause my_callback to be calledCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 6


twice each time any of the CUDA runtime API functions are invoked, once on entry to theCUDA function and once just before exit from the CUDA function. <strong>CUPTI</strong> callback APIfunctions cuptiEnableCallback and cuptiEnableAllDomains can also be used toassociate CUDA API functions with a callback (see reference below for more information).The following code shows a typical callback function.void <strong>CUPTI</strong>APImy_callback ( void * userdata , CUpti_CallbackDomain domain ,CUpti_CallbackId cbid , const void * cbdata ){const CUpti_CallbackData * cbInfo = ( CUpti_CallbackData *)←↪cbdata ;MyDataStruct * my_data = ( MyDataStruct *) userdata ;if (( domain == <strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API ) &&( cbid == <strong>CUPTI</strong>_RUNTIME_TRACE_CBID_cudaMemcpy_v3020 )) {if ( cbInfo -> callbackSite == <strong>CUPTI</strong>_API_ENTER ) {cudaMemcpy_v3020_params * funcParams =( cudaMemcpy_v3020_params *)(cbInfo ->functionParams );...}size_t count = funcParams -> count ;enum cudaMemcpyKind kind = funcParams -> kind ;...In your callback function, you use the CUpti_CallbackDomain and CUpti_CallbackIDparameters to determine which CUDA API function invocation is causing this callback. Inthe example above, we are checking for the CUDA runtime cudaMemCpy function. TheCUpti_CallbackData parameter holds a structure of useful information that can be usedwithin the callback. In this case we use the callbackSite member of the structure todetect that the callback is occurring on entry to cudaMemCpy, and we use thefunctionParams member to access the parameters that were passed to cudaMemCpy. Toaccess the parameters we first cast functionParams to a structure type corresponding tothe cudaMemCpy function. These parameter structures are contained ingenerated_cuda_runtime_api_meta.h, generated_cuda_meta.h, and a number of otherfiles. When possible these files are included for you by cupti.h.The callback_event and callback_timestamp samples described on page 25 bothshow how to use the callback API for the driver and runtime API domains.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 7


Resource CallbacksUsing the callback API with the <strong>CUPTI</strong>_CB_DOMAIN_RESOURCE domain, you can associate acallback function with some CUDA resource creation and destruction events. For example,when a CUDA context is created, your callback function will be invoked with a callback IDequal to <strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_CREATED. For this domain, the cbdata argumentto your callback function will be of the type CUpti_ResourceData.The activity_trace sample described on page 25 shows how to use the resource callback.Synchronization CallbacksUsing the callback API with the <strong>CUPTI</strong>_CB_DOMAIN_SYNCHRONIZE domain, you canassociate a callback function with CUDA context and stream synchronizations. Forexample, when a CUDA context is synchronized, your callback function will be invokedwith a callback ID equal to <strong>CUPTI</strong>_CBID_SYNCHRONIZE_CONTEXT_SYNCHRONIZED. For thisdomain, the cbdata argument to your callback function will be of the typeCUpti_SynchronizeData.The activity_trace sample described on page 25 shows how to use the synchronizationcallback.<strong>CUPTI</strong> Event APIThe <strong>CUPTI</strong> Event API allows you to query, configure, start, stop, and read the eventcounters on a CUDA-enabled device. The following terminology is used by the event API.Event: An event is a countable activity, action, or occurrence on a device.Event ID: Each event is assigned a unique identifier. A named event will represent thesame activity, action, or occurence on all device types. But the named event mayhave different IDs on different device families. Use cuptiEventGetIdFromName to getthe ID for a named event on a particular device.Event Category: Each event is placed in one of the categories defined byCUpti_EventCategory. The category indicates the general type of activity, action, oroccurrence measured by the event.Event Domain: A device exposes one or more event domains. Each event domainrepresents a group of related events available on that device. A device may havemultiple instances of a domain, indicating that the device can simultaneously recordmultiple instances of each event within that domain.Event Group: An event group is a collection of events that are managed together. Thenumber and type of events that can be added to an event group are subject toCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 8


device-specific limits. At any given time, a device may be configured to count eventsfrom a limited number of event groups. All events in an event group must belong tothe same event domain.Event Group Set: An event group set is a collection of event groups that can be enabled atthe same time. Event group sets are created by cuptiEventGroupSetsCreate andcuptiMetricCreateEventGroupSets.The tables included in this section list the events available for each device, as determinedby the device’s compute capability. You can also determine the events available on adevice using the cuptiDeviceEnumEventDomains and cuptiEventDomainEnumEventsfunctions. The cupti_query sample described on page 25 shows how to use thesefunctions. You can also enumerate all the <strong>CUPTI</strong> events available on any device using thecuptiEnumEventDomains function.Configuring and reading event counts requires the following steps. First, select your eventcollection mode. If you want to count events that occur during the execution of a kernel,use cuptiSetEventCollectionMode to set mode <strong>CUPTI</strong>_EVENT_COLLECTION_MODE_KERNEL.If you want to continuously sample the event counts, use mode<strong>CUPTI</strong>_EVENT_COLLECTION_MODE_CONTINUOUS. Next determine the names of the events thatyou want to count, and then use the cuptiEventGroupCreate, cuptiEventGetIdFromName,and cuptiEventGroupAddEvent functions to create and initialize an event group with thoseevents. If you are unable to add all the events to a single event group then you will need tocreate multiple event groups. Alternatively, you can use the cuptiEventGroupSetsCreatefunction to automatically create the event group(s) required for a set of events.To begin counting a set of events, enable the event group or groups that contain thoseevents by using the cuptiEventGroupEnable function. If your events are contained inmultiple event groups you may be unable to enable all of the event groups at the sametime, due to device limitations. In this case, you will need to gather the events acrossmultiple executions of the application.Use the cuptiEventGroupReadEvent and/or cuptiEventGroupReadAllEvents functions toread the event values. When you are done collecting events, use thecuptiEventGroupDisable function to stop counting of the events contained in an eventgroup. The callback_event sample described on page 25 shows how to use thesefunctions to create, enable, and disable event groups, and how to read event counts.Collecting Kernel Execution EventsA common use of the event API is to count a set of events during the execution of a kernel(as demonstrated by the callback_event sample). The following code shows a typicalcallback used for this purpose. Assume that the callback was enabled only for a kernellaunch using the CUDA runtime (i.e. by cuptiEnableCallback(1, subscriber,<strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API, <strong>CUPTI</strong>_RUNTIME_TRACE_CBID_cudaLaunch_v3020). ToCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 9


simplify the presentation error checking code has been removed.static void <strong>CUPTI</strong>APIgetEventValueCallback ( void * userdata ,CUpti_CallbackDomain domain ,CUpti_CallbackId cbid ,const void * cbdata ){const CUpti_CallbackData * cbData =( CUpti_CallbackData *) cbdata ;if ( cbData -> callbackSite == <strong>CUPTI</strong>_API_ENTER ) {cudaThreadSynchronize ();cuptiSetEventCollectionMode ( cbInfo -> context ,<strong>CUPTI</strong>_EVENT_COLLECTION_MODE_KERNEL←↪);cuptiEventGroupEnable ( eventGroup );}if ( cbData -> callbackSite == <strong>CUPTI</strong>_API_EXIT ) {cudaThreadSynchronize ();cuptiEventGroupReadEvent ( eventGroup ,<strong>CUPTI</strong>_EVENT_READ_FLAG_ACCUMULATE ,eventId ,& bytesRead , & eventVal );}}cuptiEventGroupDisable ( eventGroup );Two synchronization points are used to ensure that events are counted only for theexecution of the kernel. If the application contains other threads that launch kernels, thenadditional thread-level synchronization must also be introduced to ensure that thosethreads do not launch kernels while the callback is collecting events. When thecudaLaunch API is entered (that is, before the kernel is actually launched on the device),cudaThreadSynchronize is used to wait until the GPU is idle. The event collection modeis set to <strong>CUPTI</strong>_EVENT_COLLECTION_MODE_KERNEL so that the event counters areautomatically started and stopped just before and after the kernel executes. Then eventcollection is enabled with cuptiEventGroupEnable.When the cudaLaunch API is exited (that is, after the kernel is queued for execution onthe GPU) another cudaThreadSynchronize is used to cause the CPU thread to wait forthe kernel to finish execution. Finally, the event counts are read withcuptiEventGroupReadEvent.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 10


Sampling EventsThe event API can also be used to sample event values while a kernel or kernels areexecuting (as demonstrated by the event_sampling sample). The sample shows onepossible way to perform the sampling. The event collection mode is set to<strong>CUPTI</strong>_EVENT_COLLECTION_MODE_CONTINUOUS so that the event counters run continuously.Two threads are used in event_sampling: one thread schedules the kernels and memcpysthat perform the computation, while another thread wakes periodically to sample an eventcounter. In this sample there is no correlation of the event samples with what is happeningon the GPU. To get some coarse correlation, you can use cuptiDeviceGetTimestamp tocollect the GPU timestamp at the time of the sample and also at other interesting pointsin your application.Interpreting Event ValuesThe tables below describe the events available for each device. Each event has a type thatindicates how the activity or action associated with that event is collected. The eventtypes are SM, TPC, and FB.SM Event TypeThe SM event type indicates that the event is collected for an action or activity thatoccurs on one or more of the device’s streaming multiprocessors (SMs). A streamingmultiprocessor creates, manages, schedules, and executes threads in groups of 32 threadscalled warps.The SM event values typically represent activity or action of thread warps, and not theactivity or action of individual threads. Details of how each event is incremented are givenin the event tables below.Two factors will impact the accuracy of the values collected for SM type events. First, dueto variations in system state, event values can vary across different, identical, runs of thesame application. Second, for devices with compute capability less than 2.0, SM events arecounted only for one SM. For devices with compute capability greater than 2.0, SM eventsfrom domain_d are counted for all SMs but for SM events from domain_a are countedfor multiple but not all, SMs. To get the most consistent results inspite of these factors, itis best to have number of blocks for each kernel launched to be a multiple of the totalnumber of SMs on a device. In other words, the grid configuration should be chosen suchthat the number of blocks launched on each SM is the same and also the amount of workof interest per block is the same.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 11


TPC Event TypeThe TPC event type indicates that the event is collected for an action or activity thatoccurs on the SMs within the device’s first Texture Processing Cluster (TPC). Deviceswith compute capabiliity less than 1.3 have two SMs per TPC, and devices with computecapability 1.3 have three SMs per TPC.Several of the TPC type events measure coherent and incoherent memory transactions. Acoherent (coalesced) access is said to occur when the memory required for a half-warp’sexecution of a single global load or global store instruction can be accessed with a singlememory transaction of 32, 64, or 128 bytes. If the memory cannot be accessed with asingle memory transaction the access is incoherent. For an incoherent (non-coalesced)access a separate memory transaction is issued for each thread in the half-warp,significantly reducing performance. The requirements for coherent access vary based oncompute capability. Refer to the CUDA C Programming <strong>Guide</strong> for details.FB Event TypeThe FB event type indicates that the event is collected for an action or activity thatoccurs on a DRAM partition.Event Reference - Compute Capability 1.0 to 1.3Devices with compute capability less than 2.0 implement two event domains, calleddomain_a and domain_b. Table 1 and Table 2 give a description of each event availablein these domains. The Type column indicates the event type, as described above in theInterpreting Event Values section. For the Capability columns, a Y indicates that the eventis available for that compute capability and an N indicates that the event is not available.CapabilityEvent Name Description Type 1.0 1.1 1.2 1.3tex_cache_hit Number of texture cache hits SM Y Y Y Ytex_cache_miss Number of texture cache misses SM Y Y Y YTable 1: Capability 1.x Events For domain_aCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 12


CapabilityEvent Name Description Type 1.0 1.1 1.2 1.3branchNumber of branches taken by threads SM Y Y Y Yexecuting a kernel. This event isincremented by one if at least onethread in a warp takes the branch.Note that barrier instructions (__-syncThreads()) also get counted asbranchesdivergent_branch Number of divergent branches within SM Y Y Y Ya warp. This event is incremented byone if at least one thread in a warp diverges(that is, follows a different executionpath) via a data dependent conditionalbranch. The event is incrementedby one at each point of divergencein a warpinstructions Number of instructions executed SM Y Y Y Ywarp_serialize If two addresses of a memory request SM Y Y Y Yfall in the same memory bank, thereis a bank conflict and the access hasto be serialized. This event gives thenumber of thread warps that serializeon address conflicts to either shared orconstant memorygld_incoherent Number of non-coalesced global memoryTPC Y Y N Nloadsgld_coherent Number of coalesced global memory TPC Y Y N Nloadsgld_32bNumber of 32 byte global memory TPC N N Y Yload transactions; incremented by 1for each 32 byte transactiongld_64bNumber of 64 byte global memory TPC N N Y Yload transactions; incremented by 1for each 64 byte transactiongld_128b Number of 128 byte global memory TPC N N Y Yload transactions; incremented by 1for each 128 byte transactiongst_incoherent Number of non-coalesced global memoryTPC Y Y N Nstoresgst_coherent Number of coalesced global memorystoresTPC Y Y N NCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 13


CapabilityEvent Name Description Type 2.0 2.1sm_cta_launched Number of thread blocks launched SM Y Yl1_local_load_hit Number of local load hits in L1 cache. SM Y YThis increments by 1, 2, or 4 for 32, 64and 128 bit accesses respectivelyl1_local_load_miss Number of local load misses in L1 SM Y Ycache This increments by 1, 2, or 4 for32, 64 and 128 bit accesses respectivelyl1_local_store_hit Number of local store hits in L1 cache. SM Y YThis increments by 1, 2, or 4 for 32, 64and 128 bit accesses respectivelyl1_local_store_miss Number of local store misses in L1 SM Y Ycache. This increments by 1, 2, or 4for 32, 64 and 128 bit accesses respectivelyl1_global_load_hit Number of global load hits in L1 cache. SM Y YThis increments by 1, 2, or 4 for 32, 64and 128 bit accesses respectivelyl1_global_load_miss Number of global load misses in L1cache. This increments by 1, 2, or 4for 32, 64 and 128 bit accesses respectivelySM Y Yuncached_global_-load_transactionNumber of uncached global load transactions.This increments by 1, 2, or 4for 32, 64 and 128 bit accesses respectivelyNumber of global store transactions.This increments by 1, 2, or 4 for 32,64 and 128 bit accesses respectivelyNumber of shared bank conflictscaused due to addresses for two ormore shared memory requests fall inthe same memory bankNumber of texture cache requests.This increments by 1 for each 32-byteaccessNumber of texture cache misses. Thisincrements by 1 for each 32-byte accessNumber of texture cache requests.This increments by 1 for each 32-byteaccessglobal_store_-transactionl1_shared_bank_-conflicttex0_cache_sector_-queriestex0_cache_sector_-missestex1_cache_sector_-queriesSM Y YSM Y YSM Y YSM Y YSM Y YSM N YCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 15


CapabilityEvent Name Description Type 2.0 2.1tex1_cache_sector_-missesNumber of texture cache misses. Thisincrements by 1 for each 32-byte accessSM N YTable 3: Capability 2.x Events For domain_aCapabilityEvent Name Description Type 2.0 2.1l2_subp0_write_-sector_missesNumber of write misses in slice 0 of L2cache. This increments by 1 for each32-byte accessFB Y Yl2_subp1_write_-sector_missesl2_subp0_read_-sector_missesl2_subp1_read_-sector_missesl2_subp0_write_-sector_queriesl2_subp1_write_-sector_queriesl2_subp0_read_-sector_queriesl2_subp1_read_-sector_queriesl2_subp0_read_hit_-sectorsl2_subp1_read_hit_-sectorsl2_subp0_read_-tex_sector_queriesNumber of write misses in slice 1 of L2cache. This increments by 1 for each32-byte accessNumber of read misses in slice 0 of L2cache. This increments by 1 for each32-byte accessNumber of read misses in slice 1 of L2cache. This increments by 1 for each32-byte accessNumber of write requests from L1 toslice 0 of L2 cache. This increments by1 for each 32-byte accessNumber of write requests from L1 toslice 1 of L2 cache. This increments by1 for each 32-byte accessNumber of read requests from L1 toslice 0 of L2 cache. This incrementsby 1 for each 32-byte accessNumber of read requests from L1 toslice 1 of L2 cache. This incrementsby 1 for each 32-byte accessNumber of read requests from L1 thathit in slice 0 of L2 cache. This incrementsby 1 for each 32-byte accessNumber of read requests from L1 thathit in slice 1 of L2 cache. This incrementsby 1 for each 32-byte accessNumber of read requests from TEX toslice 0 of L2 cache. This increments by1 for each 32-byte accessFB Y YFB Y YFB Y YFB Y YFB Y YFB Y YFB Y YFB Y YFB Y YFB Y YCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 16


CapabilityEvent Name Description Type 2.0 2.1l2_subp1_read_-tex_sector_queriesNumber of read requests from TEX toslice 1 of L2 cache. This increments by1 for each 32-byte accessFB Y Yl2_subp0_read_-tex_hit_sectorsl2_subp1_read_-tex_hit_sectorsfb_subp0_read_-sectorsfb_subp1_read_-sectorsfb_subp0_write_-sectorsfb_subp1_write_-sectorsfb0_subp0_read_-sectorsfb0_subp1_read_-sectorsfb0_subp0_write_-sectorsfb0_subp1_write_-sectorsfb1_subp0_read_-sectorsfb1_subp1_read_-sectorsNumber of read requests from L1 thathit in slice 0 of L2 cache. This incrementsby 1 for each 32-byte accessNumber of read requests from L1 thathit in slice 1 of L2 cache. This incrementsby 1 for each 32-byte accessNumber of DRAM read requests tosub partition 0, increments by 1 for 32byte accessNumber of DRAM read requests tosub partition 1, increments by 1 for 32byte accessNumber of DRAM write requests tosub partition 0, increments by 1 for 32byte accessNumber of DRAM write requests tosub partition 1, increments by 1 for 32byte accessNumber of DRAM read requests tosub partition 0 of DRAM unit 0, incrementsby 1 for 32 byte accessNumber of DRAM read requests tosub partition 1 of DRAM unit 0, incrementsby 1 for 32 byte accessNumber of DRAM write requests tosub partition 0 of DRAM unit 0, incrementsby 1 for 32 byte accessNumber of DRAM write requests tosub partition 1 of DRAM unit 0, incrementsby 1 for 32 byte accessNumber of DRAM read requests tosub partition 0 of DRAM unit 1, incrementsby 1 for 32 byte accessNumber of DRAM read requests tosub partition 1 of DRAM unit 1, incrementsby 1 for 32 byte accessFB Y YFB Y YFB Y YFB Y YFB Y YFB Y YFB N Y**FB N Y**FB N Y**FB N Y**FB N Y**FB N Y**CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 17


CapabilityEvent Name Description Type 2.0 2.1fb1_subp0_write_-sectorsNumber of DRAM write requests tosub partition 0 of DRAM unit 1, incrementsby 1 for 32 byte accessFB N Y**fb1_subp1_write_-sectorsNumber of DRAM write requests tosub partition 1 of DRAM unit 1, incrementsby 1 for 32 byte accessFB N Y**Table 4: Capability 2.x Events For domain_bNotes:◮ Y**: Devices will have either fb_** counters or fb0_** and fb1_** counters. TotalDRAM reads and writes are calculated by adding values for all subpartitions.◮ fb* and l2_*_misses events often give a large value when a display is connected tothe device. To get accurate values do not connect a display to the device collectingevent counts.◮ l2_*_queries event values can be greater than l2_*_misses event values becausel2_*_queries counts only the requests from L1 to L2 (does not include, for example,texture requests) while l2_*_misses counts all misses◮ Initializing device memory on the host fetches data from DRAM to L2, which canmodify the fb*_read_sectors event values for a kernelCapabilityEvent Name Description Type 2.0 2.1gld_inst_8bit Total number of 8-bit global load instructionsSM Y Ythat are executed by all thethreads across all thread blocksgld_inst_16bit Total number of 16-bit global load instructionsSM Y Ythat are executed by all thethreads across all thread blocksgld_inst_32bit Total number of 32-bit global load instructionsSM Y Ythat are executed by all thethreads across all thread blocksgld_inst_64bit Total number of 64-bit global load instructionsSM Y Ythat are executed by all thethreads across all thread blocksgld_inst_128bit Total number of 128-bit global load instructionsthat are executed by all thethreads across all thread blocksSM Y YCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 18


CapabilityEvent Name Description Type 2.0 2.1gst_inst_8bit Total number of 8-bit global store instructionsSM Y Ythat are executed by all thethreads across all thread blocksgst_inst_16bit Total number of 16-bit global store instructionsSM Y Ythat are executed by all thethreads across all thread blocksgst_inst_32bit Total number of 32-bit global store instructionsSM Y Ythat are executed by all thethreads across all thread blocksgst_inst_64bit Total number of 64-bit global store instructionsSM Y Ythat are executed by all thethreads across all thread blocksgst_inst_128bit Total number of 128-bit global storeinstructions that are executed by allthe threads across all thread blocksSM Y YTable 5: Capability 2.x Events For domain_cCapabilityEvent Name Description Type 2.0 2.1branchNumber of branches taken by threads SM Y Yexecuting a kernel. This counter willbe incremented by one if at least onethread in a warp takes the branchdivergent_branch Number of divergent branches within SM Y Ya warp. This counter will be incrementedby one if at least one threadin a warp diverges (that is, follows adifferent execution path) via a data dependentconditional branchwarps_launched Number of warps launched SM Y Ythreads_launched Number of threads launched SM Y Yactive_warpsAccumulated number of active warps SM Y Yper cycle. For every cycle it incrementsby the number of active warpsin the cycle which can be in the range0 to 48active_cyclesNumber of cycles a multiprocessor hasat least one active warpSM Y YCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 19


CapabilityEvent Name Description Type 2.0 2.1local_loadNumber of local load instructions per SM Y Ywarplocal_storeNumber of local store instructions per SM Y Ywarpgld_requestNumber of global load instructions per SM Y Ywarpgst_requestNumber of global store instructions SM Y Yper warpshared_loadNumber of shared load instructions SM Y Yper warpshared_storeNumber of shared store instructions SM Y Yper warpprof_trigger_XX There are 8 such triggers (00-07) that SM Y Yuser can profile. The triggers aregeneric and can be inserted in anyplace of the code to collect the relatedinformationinst_issuedNumber of instructions issued includingSM Y Nreplaysinst_issued1_0 Number of times instruction group 0 SM N Y*issued one instructioninst_issued2_0 Number of times instruction group 0 SM N Y*issued two instructionsinst_issued1_1 Number of times instruction group 1 SM N Y*issued one instructioninst_issued2_1 Number of times instruction group 1 SM N Y*issued two instructionsinst_executed Number of instructions executed, not SM Y Yincluding replaysthread_inst_- Number of instructions executed by SM Y Yexecuted_0all threads, not including replays, inpipeline 0. For each instruction executedincrements by the number ofthreads in the warpthread_inst_-executed_1Number of instructions executed byall threads, not including replays, inSM Y Ypipeline 1. For each instruction executedincrements by the number ofthreads in the warpTable 6: Capability 2.x Events For domain_dCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 20


Metric Name Description Formulagld_throughput Global memory load throughput ((128*global_load_hit) + (l2_-subp0_read_requests + l2_-subp1_read_requests) * 32 -(l1_cached_local_ld_misses *128))/(gputime)gst_throughputGlobal memory store throughputgld_requested_-throughput(l2_subp0_write_requests +l2_subp1_write_requests) * 32- (l1_cached_local_ld_misses* 128))/(gputime)(gld_inst_8bit + 2*gld_inst_-16bit + 4*gld_inst_32bit +8*gld_inst_64bit + 16*gld_-inst_128bit) / gputime(gst_inst_8bit + 2*gst_inst_-16bit + 4*gst_inst_32bit +8*gst_inst_64bit + 16*gst_-inst_128bit) / gputime(fb_subp0_read + fb_subp1_-read) * 32 / gputimegst_requested_-throughputdram_read_-throughputdram_write_-throughputl1_cache_global_-hit_ratel1_cache_local_-hit_rate100*l1_cached_local_ld_-hits+l1_cached_local_st_-hits/(l1_cached_local_ld_-hits+l1_cached_local_ld_-misses+l1_cached_local_st_-hits+l1_cached_local_st_-misses)tex_cache_hit_-ratetex_cache_-throughputRequested global memory loadthroughputRequested global memory storethroughputDRAM read throughputDRAM write throughput (fb_subp0_write + fb_-subp1_write) * 32 / gputimeHit rate in L1 cache for global 100*l1_cached_global_ld_-loadshits/(l1_cached_global_ld_-hits+l1_cached_global_ld_-misses)Hit rate in L1 cache for localloads and storesTexture cache hit rate 100 * (tex0Queries -tex0Misses)/tex0QueriesTexture cache throughput tex_cache_sector_queries * 32/ gputimeTable 8: Capability 2.x MetricsCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 24


<strong>CUPTI</strong> Reference<strong>CUPTI</strong> VersionDefines◮ #define <strong>CUPTI</strong>_API_VERSION 2The API version for this implementation of <strong>CUPTI</strong>.Functions◮ CUptiResult cuptiGetVersion (uint32_t ∗version)Get the <strong>CUPTI</strong> API version.Define Documentation#define <strong>CUPTI</strong>_API_VERSION 2The API version for this implementation of <strong>CUPTI</strong>. This define along withcuptiGetVersion can be used to dynamically detect if the version of <strong>CUPTI</strong> compiledagainst matches the version of the loaded <strong>CUPTI</strong> library.v1 : CUDAToolsSDK 4.0 v2 : CUDAToolsSDK 4.1Function DocumentationCUptiResult cuptiGetVersion (uint32_t ∗ version)Return the API version in ∗version.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 26


Parameters:version Returns the versionReturn values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if version is NULLSee also:<strong>CUPTI</strong>_API_VERSIONCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 27


<strong>CUPTI</strong> Result CodesEnumerations◮ enum CUptiResult {<strong>CUPTI</strong>_SUCCESS = 0,<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER = 1,<strong>CUPTI</strong>_ERROR_INVALID_DEVICE = 2,<strong>CUPTI</strong>_ERROR_INVALID_CONTEXT = 3,<strong>CUPTI</strong>_ERROR_INVALID_EVENT_DOMAIN_ID = 4,<strong>CUPTI</strong>_ERROR_INVALID_EVENT_ID = 5,<strong>CUPTI</strong>_ERROR_INVALID_EVENT_NAME = 6,<strong>CUPTI</strong>_ERROR_INVALID_OPERATION = 7,<strong>CUPTI</strong>_ERROR_OUT_OF_MEMORY = 8,<strong>CUPTI</strong>_ERROR_HARDWARE = 9,<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT = 10,<strong>CUPTI</strong>_ERROR_API_NOT_IMPLEMENTED = 11,<strong>CUPTI</strong>_ERROR_MAX_LIMIT_REACHED = 12,<strong>CUPTI</strong>_ERROR_NOT_READY = 13,<strong>CUPTI</strong>_ERROR_NOT_COMPATIBLE = 14,<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED = 15,<strong>CUPTI</strong>_ERROR_INVALID_METRIC_ID = 16,<strong>CUPTI</strong>_ERROR_INVALID_METRIC_NAME = 17,<strong>CUPTI</strong>_ERROR_QUEUE_EMPTY = 18,<strong>CUPTI</strong>_ERROR_INVALID_HANDLE = 19,<strong>CUPTI</strong>_ERROR_INVALID_STREAM = 20,<strong>CUPTI</strong>_ERROR_INVALID_KIND = 21,<strong>CUPTI</strong>_ERROR_INVALID_EVENT_VALUE = 22,<strong>CUPTI</strong>_ERROR_DISABLED = 100,<strong>CUPTI</strong>_ERROR_UNKNOWN = 999 }CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 28


Functions◮ CUptiResult cuptiGetResultString (CUptiResult result, const char ∗∗str)Get the descriptive string for a CUptiResult.Enumeration Type Documentationenum CUptiResultResult codes.Enumerator:<strong>CUPTI</strong>_SUCCESSNo error.<strong>CUPTI</strong>_ERROR_INVALID_PARAMETERinvalid.<strong>CUPTI</strong>_ERROR_INVALID_DEVICECUDA device.<strong>CUPTI</strong>_ERROR_INVALID_CONTEXT<strong>CUPTI</strong>_ERROR_INVALID_EVENT_DOMAIN_IDinvalid.<strong>CUPTI</strong>_ERROR_INVALID_EVENT_ID<strong>CUPTI</strong>_ERROR_INVALID_EVENT_NAMEOne or more of the parameters isThe device does not correspond to a validThe context is NULL or not valid.The event id is invalid.The event domain id isThe event name is invalid.<strong>CUPTI</strong>_ERROR_INVALID_OPERATION The current operation cannot beperformed due to dependency on other factors.<strong>CUPTI</strong>_ERROR_OUT_OF_MEMORYperform the requested operation.Unable to allocate enough memory to<strong>CUPTI</strong>_ERROR_HARDWARE The performance monitoring hardware could notbe reserved or some other hardware error occurred.<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENTsize is not sufficient to return all requested data.<strong>CUPTI</strong>_ERROR_API_NOT_IMPLEMENTED<strong>CUPTI</strong>_ERROR_MAX_LIMIT_REACHED<strong>CUPTI</strong>_ERROR_NOT_READYrequested operation.<strong>CUPTI</strong>_ERROR_NOT_COMPATIBLEwith the current state of the object<strong>CUPTI</strong>_ERROR_NOT_INITIALIZEDconnection to the CUDA driver.The output bufferAPI is not implemented.The maximum limit is reached.The object is not yet ready to perform theThe current operation is not compatible<strong>CUPTI</strong> is unable to initialize itsCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 29


<strong>CUPTI</strong>_ERROR_INVALID_METRIC_ID<strong>CUPTI</strong>_ERROR_INVALID_METRIC_NAME<strong>CUPTI</strong>_ERROR_QUEUE_EMPTY<strong>CUPTI</strong>_ERROR_INVALID_HANDLE<strong>CUPTI</strong>_ERROR_INVALID_STREAM<strong>CUPTI</strong>_ERROR_INVALID_KINDThe metric id is invalid.The queue is empty.The metric name is invalid.Invalid handle (internal?).Invalid stream.Invalid kind.<strong>CUPTI</strong>_ERROR_INVALID_EVENT_VALUE<strong>CUPTI</strong>_ERROR_DISABLEDprofiling mode<strong>CUPTI</strong>_ERROR_UNKNOWNFunction DocumentationInvalid event value.<strong>CUPTI</strong> profiling is not compatible with currentAn unknown internal error has occurred.CUptiResult cuptiGetResultString (CUptiResult result, const char∗∗ str)Return the descriptive string for a CUptiResult in ∗str.Note:Thread-safety: this function is thread safe.Parameters:result The result to get the string forstr Returns the stringReturn values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if str is NULL or result is not a validCUptiResultCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 30


<strong>CUPTI</strong> Activity APIData Structures◮ struct CUpti_ActivityThe base activity record.◮ struct CUpti_ActivityAPIThe activity record for a driver or runtime API invocation.◮ struct CUpti_ActivityContextThe activity record for a context.◮ struct CUpti_ActivityDeviceThe activity record for a device.◮ struct CUpti_ActivityEventThe activity record for a <strong>CUPTI</strong> event.◮ struct CUpti_ActivityKernelThe activity record for kernel.◮ struct CUpti_ActivityMemcpyThe activity record for memory copies.◮ struct CUpti_ActivityMemsetThe activity record for memset.◮ struct CUpti_ActivityMetricThe activity record for a <strong>CUPTI</strong> metric.Enumerations◮ enum CUpti_ActivityComputeApiKind {<strong>CUPTI</strong>_ACTIVITY_COMPUTE_API_UNKNOWN = 0,<strong>CUPTI</strong>_ACTIVITY_COMPUTE_API_CUDA = 1,CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 31


<strong>CUPTI</strong>_ACTIVITY_COMPUTE_API_OPENCL = 2 }The kind of a compute API, indicating if the context was created for CUDA api orOpenCL APIs.◮ enum CUpti_ActivityFlag {<strong>CUPTI</strong>_ACTIVITY_FLAG_NONE = 0,<strong>CUPTI</strong>_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS = 1


<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_HTOH = 9 }The kind of a memory copy, indicating the source and destination targets of the copy.◮ enum CUpti_ActivityMemoryKind {<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_UNKNOWN = 0,<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_PAGEABLE = 1,<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_PINNED = 2,<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_DEVICE = 3,<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_ARRAY = 4 }The kinds of memory accessed by a memory copy.Functions◮ CUptiResult cuptiActivityDequeueBuffer (CUcontext context, uint32_t streamId,uint8_t ∗∗buffer, size_t ∗validBufferSizeBytes)Dequeue a buffer containing activity records.◮ CUptiResult cuptiActivityDisable (CUpti_ActivityKind kind)Disable collection of a specific kind of activity record.◮ CUptiResult cuptiActivityEnable (CUpti_ActivityKind kind)Enable collection of a specific kind of activity record.◮ CUptiResult cuptiActivityEnqueueBuffer (CUcontext context, uint32_t streamId,uint8_t ∗buffer, size_t bufferSizeBytes)Queue a buffer for activity record collection.◮ CUptiResult cuptiActivityGetNextRecord (uint8_t ∗buffer, size_tvalidBufferSizeBytes, CUpti_Activity ∗∗record)Iterate over the activity records in a buffer.◮ CUptiResult cuptiActivityGetNumDroppedRecords (CUcontext context, uint32_tstreamId, size_t ∗dropped)Get the number of activity records that were dropped from a queue because of insufficientbuffer space.◮ CUptiResult cuptiActivityQueryBuffer (CUcontext context, uint32_t streamId,CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 33


size_t ∗validBufferSizeBytes)Query the status of the buffer at the head of a queue.◮ CUptiResult cuptiGetStreamId (CUcontext context, CUstream stream, uint32_t∗streamId)Get the ID of a stream.Enumeration Type Documentationenum CUpti_ActivityComputeApiKindEnumerator:<strong>CUPTI</strong>_ACTIVITY_COMPUTE_API_UNKNOWN The compute API is notknown.<strong>CUPTI</strong>_ACTIVITY_COMPUTE_API_CUDA The compute APIs are for CUDA.<strong>CUPTI</strong>_ACTIVITY_COMPUTE_API_OPENCL The compute APIs are forOpenCL.enum CUpti_ActivityFlagActivity record flags. Flags can be combined by bitwise OR to associated multiple flagswith an activity record. Each flag is specific to a certain activity kind, as noted below.Enumerator:<strong>CUPTI</strong>_ACTIVITY_FLAG_NONEIndicates the activity record has no flags.<strong>CUPTI</strong>_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS Indicatesthe activity represents a device that supports concurrent kernel execution. Validfor <strong>CUPTI</strong>_ACTIVITY_KIND_DEVICE.<strong>CUPTI</strong>_ACTIVITY_FLAG_MEMCPY_ASYNCan asychronous memcpy operation. Valid for<strong>CUPTI</strong>_ACTIVITY_KIND_MEMCPY.enum CUpti_ActivityKindIndicates the activity representsEach activity record kind represents information about a GPU or an activity occurring ona CPU or GPU. Each kind is associated with a activity record structure that holds theinformation associated with the kind.See also:CUpti_ActivityCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 34


CUpti_ActivityAPICUpti_ActivityDeviceCUpti_ActivityEventCUpti_ActivityKernelCUpti_ActivityMemcpyCUpti_ActivityMemsetCUpti_ActivityMetricEnumerator:<strong>CUPTI</strong>_ACTIVITY_KIND_INVALIDThe activity record is invalid.<strong>CUPTI</strong>_ACTIVITY_KIND_MEMCPY A hosthost, hostdevice, ordevicedevice memory copy. The corresponding activity record structure isCUpti_ActivityMemcpy.<strong>CUPTI</strong>_ACTIVITY_KIND_MEMSET A memory set executing on the GPU. Thecorresponding activity record structure is CUpti_ActivityMemset.<strong>CUPTI</strong>_ACTIVITY_KIND_KERNEL A kernel executing on the GPU. Thecorresponding activity record structure is CUpti_ActivityKernel.<strong>CUPTI</strong>_ACTIVITY_KIND_DRIVER A CUDA driver API function execution.The corresponding activity record structure is CUpti_ActivityAPI.<strong>CUPTI</strong>_ACTIVITY_KIND_RUNTIME A CUDA runtime API function execution.The corresponding activity record structure is CUpti_ActivityAPI.<strong>CUPTI</strong>_ACTIVITY_KIND_EVENT An event value. The corresponding activityrecord structure is CUpti_ActivityEvent.<strong>CUPTI</strong>_ACTIVITY_KIND_METRIC A metric value. The corresponding activityrecord structure is CUpti_ActivityMetric.<strong>CUPTI</strong>_ACTIVITY_KIND_DEVICE Information about a device. Thecorresponding activity record structure is CUpti_ActivityDevice.<strong>CUPTI</strong>_ACTIVITY_KIND_CONTEXT Information about a context. Thecorresponding activity record structure is CUpti_ActivityContext.enum CUpti_ActivityMemcpyKindEach kind represents the source and destination targets of a memory copy. Targets arehost, device, and array.Enumerator:<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_UNKNOWNnot known.<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_HTOD<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_DTOHThe memory copy kind isA host to device memory copy.A device to host memory copy.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 35


<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_HTOAcopy.<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_ATOHcopy.<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_ATOAmemory copy.<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_ATODcopy.<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_DTOAcopy.<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_DTOD<strong>CUPTI</strong>_ACTIVITY_MEMCPY_KIND_HTOHA host to device array memoryA device array to host memoryA device array to device arrayA device array to device memoryA device to device array memoryA device to device memory copy.A host to host memory copy.enum CUpti_ActivityMemoryKindEach kind represents the type of the source or destination memory accessed by a memorycopy.Enumerator:<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_UNKNOWNmemory kind is unknown.<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_PAGEABLEmemory is pageable.<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_PINNEDmemory is pinned.<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_DEVICEmemory is on the device.<strong>CUPTI</strong>_ACTIVITY_MEMORY_KIND_ARRAYmemory is an array.Function DocumentationThe source or destinationThe source or destinationThe source or destinationThe source or destinationThe source or destinationCUptiResult cuptiActivityDequeueBuffer (CUcontext context,uint32_t streamId, uint8_t ∗∗ buffer, size_t ∗ validBufferSizeBytes)Remove the buffer from the head of the specified queue. See cuptiActivityEnqueueBuffer()for description of queues. Calling this function transfers ownership of the buffer from<strong>CUPTI</strong>. <strong>CUPTI</strong> will no add any activity records to the buffer after it is dequeued.Parameters:context The context, or NULL to dequeue from the global queueCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 36


streamId The stream IDbuffer Returns the dequeued buffervalidBufferSizeBytes Returns the number of bytes in the buffer that contain activityrecordsReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if buffer or validBufferSizeBytesare NULL<strong>CUPTI</strong>_ERROR_QUEUE_EMPTY the queue is empty, buffer returns NULL andvalidBufferSizeBytes returns 0CUptiResult cuptiActivityDisable (CUpti_ActivityKind kind)Disable collection of a specific kind of activity record. Multiple kinds can be disabled bycalling this function multiple times. By default all activity kinds are disabled for collection.Parameters:kind The kind of activity record to stop collectingReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZEDCUptiResult cuptiActivityEnable (CUpti_ActivityKind kind)Enable collection of a specific kind of activity record. Multiple kinds can be enabled bycalling this function multiple times. By default all activity kinds are disabled for collection.Parameters:kind The kind of activity record to collectReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_NOT_COMPATIBLE if the activity kind cannot be enabledCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 37


CUptiResult cuptiActivityEnqueueBuffer (CUcontext context,uint32_t streamId, uint8_t ∗ buffer, size_t bufferSizeBytes)Queue a buffer for activity record collection. Calling this function transfers ownership ofthe buffer to <strong>CUPTI</strong>. The buffer should not be accessed or modified until ownership isregained by calling cuptiActivityDequeueBuffer().There are three types of queues:Global Queue: The global queue collects all activity records that are not associated with avalid context. All device and API activity records are collected in the global queue. Abuffer is enqueued in the global queue by specifying context == NULL.Context Queue: Each context queue collects activity records associated with that contextthat are not associated with a specific stream or that are associated with the defaultstream. A buffer is enqueued in a context queue by specifying the context and a streamIdof 0.Stream Queue: Each stream queue collects memcpy, memset, and kernel activity recordsassociated with the stream. A buffer is enqueued in a stream queue by specifying a contextand a non-zero stream ID.Multiple buffers can be enqueued on each queue, and buffers can be enqueue on multiplequeues.When a new activity record needs to be recorded, <strong>CUPTI</strong> searches for a non-empty queueto hold the record in this order: 1) the appropriate stream queue, 2) the appropriatecontext queue, and 3) the global queue. If the search does not find any queue with a bufferthen the activity record is dropped. If the search finds a queue containing a buffer, butthat buffer is full, then the activity record is dropped and the dropped record count for thequeue is incremented. If the search finds a queue containing a buffer with space availableto hold the record, then the record is recorded in the buffer.At a minimum, one or more buffers must be queued in the global queue at all times toavoid dropping activity records. For correct operation it is also necessary to enqueue atleast one buffer in the context queue of each context as it is created. The stream queuesare optional and can be used to reduce or eliminate application perturbations caused bythe need to process or save the activity records returned in the buffers. For example, if astream queue is used, that queue can be flushed when the stream is synchronized.Parameters:context The context, or NULL to enqueue on the global queuestreamId The stream IDbuffer The pointer to user supplied buffer for storing activity records.The buffer mustbe at least 8 byte aligned, and the size of the buffer must be at least 1024 bytes.bufferSizeBytes The size of the buffer, in bytes. The size of the buffer must be at least1024 bytes.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 38


Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if buffer is NULL, does not havealignment of at least 8 bytes, or is not at least 1024 bytes in sizeCUptiResult cuptiActivityGetNextRecord (uint8_t ∗ buffer, size_tvalidBufferSizeBytes, CUpti_Activity ∗∗ record)This is a helper function to iterate over the activity records in a buffer. A buffer of activityrecords is typically obtained by using the cuptiActivityDequeueBuffer() function.An example of typical usage:CUpti_Activity *record = NULL;CUptiResult status = <strong>CUPTI</strong>_SUCCESS;do {status = cuptiActivityGetNextRecord(buffer, validSize, &record);if(status == <strong>CUPTI</strong>_SUCCESS) {// Use record here...}else if (status == <strong>CUPTI</strong>_ERROR_MAX_LIMIT_REACHED)break;else {goto Error;}} while (1);Parameters:buffer The buffer containing activity recordsrecord Inputs the previous record returned by cuptiActivityGetNextRecord andreturns the next activity record from the buffer. If input value if NULL, returnsthe first activity record in the buffer.validBufferSizeBytes The number of valid bytes in the buffer.Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_MAX_LIMIT_REACHED if no more records in the buffer<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if buffer is NULL.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 39


CUptiResult cuptiActivityGetNumDroppedRecords (CUcontextcontext, uint32_t streamId, size_t ∗ dropped)Get the number of records that were dropped from a queue because all the buffers in thequeue are full. See cuptiActivityEnqueueBuffer() for description of queues. Calling thisfunction does not transfer ownership of the buffer. The dropped count maintained for thequeue is reset to zero when this function is called.Parameters:context The context, or NULL to get dropped count from global queuestreamId The stream IDdropped The number of records that were dropped since the last call to this function.Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if dropped is NULLCUptiResult cuptiActivityQueryBuffer (CUcontext context, uint32_tstreamId, size_t ∗ validBufferSizeBytes)Query the status of buffer at the head in the queue. See cuptiActivityEnqueueBuffer() fordescription of queues. Calling this function does not transfer ownership of the buffer.Parameters:context The context, or NULL to query the global queuestreamId The stream IDvalidBufferSizeBytes Returns the number of bytes in the buffer that contain activityrecordsReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if buffer or validBufferSizeBytesare NULL<strong>CUPTI</strong>_ERROR_MAX_LIMIT_REACHED if buffer is full<strong>CUPTI</strong>_ERROR_QUEUE_EMPTY the queue is empty, validBufferSizeBytesreturns 0CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 40


CUptiResult cuptiGetStreamId (CUcontext context, CUstreamstream, uint32_t ∗ streamId)Get the ID of a stream. The stream ID is needed to enqueue and dequeue activity recordbuffers on a stream queue.Parameters:context The context containing the streamstream The streamstreamId Returns the ID for the streamReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if streamId is NULLSee also:cuptiActivityEnqueueBuffercuptiActivityDequeueBufferCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 41


CUpti_Activity Type ReferenceThe base activity record.Data Fields◮ CUpti_ActivityKind kindDetailed DescriptionThe activity API uses a CUpti_Activity as a generic representation for any activity. The’kind’ field is used to determine the specific activity kind, and from that theCUpti_Activity object can be cast to the specific activity record type appropriate for thatkind.Note that all activity record types are padded and aligned to ensure that each member ofthe record is naturally aligned.See also:CUpti_ActivityKindField DocumentationCUpti_ActivityKind CUpti_Activity::kindThe kind of this activity.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 42


CUpti_ActivityAPI Type ReferenceThe activity record for a driver or runtime API invocation.Data Fields◮ CUpti_CallbackId cbid◮ uint32_t correlationId◮ uint64_t end◮ CUpti_ActivityKind kind◮ uint32_t processId◮ uint32_t returnValue◮ uint64_t start◮ uint32_t threadIdDetailed DescriptionThis activity record represents an invocation of a driver or runtime API(<strong>CUPTI</strong>_ACTIVITY_KIND_DRIVER and <strong>CUPTI</strong>_ACTIVITY_KIND_RUNTIME).Field DocumentationCUpti_CallbackId CUpti_ActivityAPI::cbidThe ID of the driver or runtime function.uint32_t CUpti_ActivityAPI::correlationIdThe correlation ID of the driver or runtime CUDA function. Each function invocation isassigned a unique correlation ID that is identical to the correlation ID in the memcpy,memset, or kernel activity record that is associated with this function.uint64_t CUpti_ActivityAPI::endThe end timestamp for the function, in ns.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 43


CUpti_ActivityKind CUpti_ActivityAPI::kindThe activity record kind, must be <strong>CUPTI</strong>_ACTIVITY_KIND_DRIVER or<strong>CUPTI</strong>_ACTIVITY_KIND_RUNTIME.uint32_t CUpti_ActivityAPI::processIdThe ID of the process where the driver or runtime CUDA function is executing.uint32_t CUpti_ActivityAPI::returnValueThe return value for the function. For a CUDA driver function with will be a CUresultvalue, and for a CUDA runtime function this will be a cudaError_t value.uint64_t CUpti_ActivityAPI::startThe start timestamp for the function, in ns.uint32_t CUpti_ActivityAPI::threadIdThe ID of the thread where the driver or runtime CUDA function is executing.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 44


CUpti_ActivityDevice Type ReferenceThe activity record for a device.Data Fields◮ uint32_t computeCapabilityMajor◮ uint32_t computeCapabilityMinor◮ uint32_t constantMemorySize◮ uint32_t coreClockRate◮ uint32_t flags◮ uint64_t globalMemoryBandwidth◮ uint64_t globalMemorySize◮ uint32_t id◮ CUpti_ActivityKind kind◮ uint32_t l2CacheSize◮ uint32_t maxBlockDimX◮ uint32_t maxBlockDimY◮ uint32_t maxBlockDimZ◮ uint32_t maxBlocksPerMultiprocessor◮ uint32_t maxGridDimX◮ uint32_t maxGridDimY◮ uint32_t maxGridDimZ◮ uint32_t maxIPC◮ uint32_t maxRegistersPerBlock◮ uint32_t maxSharedMemoryPerBlock◮ uint32_t maxThreadsPerBlock◮ uint32_t maxWarpsPerMultiprocessor◮ const char ∗ name◮ uint32_t numMemcpyEngines◮ uint32_t numMultiprocessors◮ uint32_t numThreadsPerWarpCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 45


Detailed DescriptionThis activity record represents information about a GPU device(<strong>CUPTI</strong>_ACTIVITY_KIND_DEVICE).Field Documentationuint32_t CUpti_ActivityDevice::computeCapabilityMajorCompute capability for the device, major number.uint32_t CUpti_ActivityDevice::computeCapabilityMinorCompute capability for the device, minor number.uint32_t CUpti_ActivityDevice::constantMemorySizeThe amount of constant memory on the device, in bytes.uint32_t CUpti_ActivityDevice::coreClockRateThe core clock rate of the device, in kHz.uint32_t CUpti_ActivityDevice::flagsThe flags associated with the device.See also:CUpti_ActivityFlaguint64_t CUpti_ActivityDevice::globalMemoryBandwidthThe global memory bandwidth available on the device, in kBytes/sec.uint64_t CUpti_ActivityDevice::globalMemorySizeThe amount of global memory on the device, in bytes.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 46


uint32_t CUpti_ActivityDevice::idThe device ID.CUpti_ActivityKind CUpti_ActivityDevice::kindThe activity record kind, must be <strong>CUPTI</strong>_ACTIVITY_KIND_DEVICE.uint32_t CUpti_ActivityDevice::l2CacheSizeThe size of the L2 cache on the device, in bytes.uint32_t CUpti_ActivityDevice::maxBlockDimXMaximum allowed X dimension for a block.uint32_t CUpti_ActivityDevice::maxBlockDimYMaximum allowed Y dimension for a block.uint32_t CUpti_ActivityDevice::maxBlockDimZMaximum allowed Z dimension for a block.uint32_t CUpti_ActivityDevice::maxBlocksPerMultiprocessorMaximum number of blocks that can be present on a multiprocessor at any given time.uint32_t CUpti_ActivityDevice::maxGridDimXMaximum allowed X dimension for a grid.uint32_t CUpti_ActivityDevice::maxGridDimYMaximum allowed Y dimension for a grid.uint32_t CUpti_ActivityDevice::maxGridDimZMaximum allowed Z dimension for a grid.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 47


uint32_t CUpti_ActivityDevice::maxIPCThe maximum "instructions per cycle" possible on each device multiprocessor.uint32_t CUpti_ActivityDevice::maxRegistersPerBlockMaximum number of registers that can be allocated to a block.uint32_t CUpti_ActivityDevice::maxSharedMemoryPerBlockMaximum amount of shared memory that can be assigned to a block, in bytes.uint32_t CUpti_ActivityDevice::maxThreadsPerBlockMaximum number of threads allowed in a block.uint32_t CUpti_ActivityDevice::maxWarpsPerMultiprocessorMaximum number of warps that can be present on a multiprocessor at any given time.const char∗ CUpti_ActivityDevice::nameThe device name. This name is shared across all activity records representing instances ofthe device, and so should not be modified.uint32_t CUpti_ActivityDevice::numMemcpyEnginesNumber of memory copy engines on the device.uint32_t CUpti_ActivityDevice::numMultiprocessorsNumber of multiprocessors on the device.uint32_t CUpti_ActivityDevice::numThreadsPerWarpThe number of threads per warp on the device.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 48


CUpti_ActivityEvent Type ReferenceThe activity record for a <strong>CUPTI</strong> event.Data Fields◮ uint32_t correlationId◮ CUpti_EventDomainID domain◮ CUpti_EventID id◮ CUpti_ActivityKind kind◮ uint64_t valueDetailed DescriptionThis activity record represents the collection of a <strong>CUPTI</strong> event value(<strong>CUPTI</strong>_ACTIVITY_KIND_EVENT). This activity record kind is not produced by theactivity API but is included for completeness and ease-of-use. Profile frameworks built ontop of <strong>CUPTI</strong> that collect event data may choose to use this type to store the collectedevent data.Field Documentationuint32_t CUpti_ActivityEvent::correlationIdThe correlation ID of the event. Use of this ID is user-defined, but typically this ID valuewill equal the correlation ID of the kernel for which the event was gathered.CUpti_EventDomainID CUpti_ActivityEvent::domainThe event domain ID.CUpti_EventID CUpti_ActivityEvent::idThe event ID.CUpti_ActivityKind CUpti_ActivityEvent::kindThe activity record kind, must be <strong>CUPTI</strong>_ACTIVITY_KIND_EVENT.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 49


uint64_t CUpti_ActivityEvent::valueThe event value.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 50


CUpti_ActivityKernel Type ReferenceThe activity record for kernel.Data Fields◮ int32_t blockX◮ int32_t blockY◮ int32_t blockZ◮ uint8_t cacheConfigExecuted◮ uint8_t cacheConfigRequested◮ uint32_t contextId◮ uint32_t correlationId◮ uint32_t deviceId◮ int32_t dynamicSharedMemory◮ uint64_t end◮ int32_t gridX◮ int32_t gridY◮ int32_t gridZ◮ CUpti_ActivityKind kind◮ uint32_t localMemoryPerThread◮ uint32_t localMemoryTotal◮ const char ∗ name◮ uint32_t pad◮ uint16_t registersPerThread◮ void ∗ reserved0◮ uint32_t runtimeCorrelationId◮ uint64_t start◮ int32_t staticSharedMemory◮ uint32_t streamIdDetailed DescriptionThis activity record represents a kernel execution(<strong>CUPTI</strong>_ACTIVITY_KIND_KERNEL).CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 51


Field Documentationint32_t CUpti_ActivityKernel::blockXThe X-dimension block size for the kernel.int32_t CUpti_ActivityKernel::blockYThe Y-dimension block size for the kernel.int32_t CUpti_ActivityKernel::blockZThe Z-dimension grid size for the kernel.uint8_t CUpti_ActivityKernel::cacheConfigExecutedThe cache configuration used for the kernel. The value is one of the CUfunc_cacheenumeration values from cuda.h.uint8_t CUpti_ActivityKernel::cacheConfigRequestedThe cache configuration requested by the kernel. The value is one of the CUfunc_cacheenumeration values from cuda.h.uint32_t CUpti_ActivityKernel::contextIdThe ID of the context where the kernel is executing.uint32_t CUpti_ActivityKernel::correlationIdThe correlation ID of the kernel. Each kernel execution is assigned a unique correlation IDthat is identical to the correlation ID in the driver API activity record that launched thekernel.uint32_t CUpti_ActivityKernel::deviceIdThe ID of the device where the kernel is executing.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 52


int32_t CUpti_ActivityKernel::dynamicSharedMemoryThe dynamic shared memory reserved for the kernel, in bytes.uint64_t CUpti_ActivityKernel::endThe end timestamp for the kernel execution, in ns.int32_t CUpti_ActivityKernel::gridXThe X-dimension grid size for the kernel.int32_t CUpti_ActivityKernel::gridYThe Y-dimension grid size for the kernel.int32_t CUpti_ActivityKernel::gridZThe Z-dimension grid size for the kernel.CUpti_ActivityKind CUpti_ActivityKernel::kindThe activity record kind, must be <strong>CUPTI</strong>_ACTIVITY_KIND_KERNEL.uint32_t CUpti_ActivityKernel::localMemoryPerThreadThe amount of local memory reserved for each thread, in bytes.uint32_t CUpti_ActivityKernel::localMemoryTotalThe total amount of local memory reserved for the kernel, in bytes.const char∗ CUpti_ActivityKernel::nameThe name of the kernel. This name is shared across all activity records representing thesame kernel, and so should not be modified.uint32_t CUpti_ActivityKernel::padUndefined. Reserved for internal use.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 53


uint16_t CUpti_ActivityKernel::registersPerThreadThe number of registers required for each thread executing the kernel.void∗ CUpti_ActivityKernel::reserved0Undefined. Reserved for internal use.uint32_t CUpti_ActivityKernel::runtimeCorrelationIdThe runtime correlation ID of the kernel. Each kernel execution is assigned a uniqueruntime correlation ID that is identical to the correlation ID in the runtime API activityrecord that launched the kernel.uint64_t CUpti_ActivityKernel::startThe start timestamp for the kernel execution, in ns.int32_t CUpti_ActivityKernel::staticSharedMemoryThe static shared memory allocated for the kernel, in bytes.uint32_t CUpti_ActivityKernel::streamIdThe ID of the stream where the kernel is executing.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 54


CUpti_ActivityMemcpy Type ReferenceThe activity record for memory copies.Data Fields◮ uint64_t bytes◮ uint32_t contextId◮ uint8_t copyKind◮ uint32_t correlationId◮ uint32_t deviceId◮ uint8_t dstKind◮ uint64_t end◮ uint8_t flags◮ CUpti_ActivityKind kind◮ void ∗ reserved0◮ uint32_t runtimeCorrelationId◮ uint8_t srcKind◮ uint64_t start◮ uint32_t streamIdDetailed DescriptionThis activity record represents a memory copy (<strong>CUPTI</strong>_ACTIVITY_KIND_MEMCPY).Field Documentationuint64_t CUpti_ActivityMemcpy::bytesThe number of bytes transferred by the memory copy.uint32_t CUpti_ActivityMemcpy::contextIdThe ID of the context where the memory copy is occurring.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 55


uint8_t CUpti_ActivityMemcpy::copyKindThe kind of the memory copy, stored as a byte to reduce record size.See also:CUpti_ActivityMemcpyKinduint32_t CUpti_ActivityMemcpy::correlationIdThe correlation ID of the memory copy. Each memory copy is assigned a uniquecorrelation ID that is identical to the correlation ID in the driver API activity record thatlaunched the memory copy.uint32_t CUpti_ActivityMemcpy::deviceIdThe ID of the device where the memory copy is occurring.uint8_t CUpti_ActivityMemcpy::dstKindThe destination memory kind read by the memory copy, stored as a byte to reduce recordsize.See also:CUpti_ActivityMemoryKinduint64_t CUpti_ActivityMemcpy::endThe end timestamp for the memory copy, in ns.uint8_t CUpti_ActivityMemcpy::flagsThe flags associated with the memory copy.See also:CUpti_ActivityFlagCUpti_ActivityKind CUpti_ActivityMemcpy::kindThe activity record kind, must be <strong>CUPTI</strong>_ACTIVITY_KIND_MEMCPY.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 56


void∗ CUpti_ActivityMemcpy::reserved0Undefined. Reserved for internal use.uint32_t CUpti_ActivityMemcpy::runtimeCorrelationIdThe runtime correlation ID of the memory copy. Each memory copy is assigned a uniqueruntime correlation ID that is identical to the correlation ID in the runtime API activityrecord that launched the memory copy.uint8_t CUpti_ActivityMemcpy::srcKindThe source memory kind read by the memory copy, stored as a byte to reduce record size.See also:CUpti_ActivityMemoryKinduint64_t CUpti_ActivityMemcpy::startThe start timestamp for the memory copy, in ns.uint32_t CUpti_ActivityMemcpy::streamIdThe ID of the stream where the memory copy is occurring.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 57


CUpti_ActivityMemset Type ReferenceThe activity record for memset.Data Fields◮ uint64_t bytes◮ uint32_t contextId◮ uint32_t correlationId◮ uint32_t deviceId◮ uint64_t end◮ CUpti_ActivityKind kind◮ void ∗ reserved0◮ uint32_t runtimeCorrelationId◮ uint64_t start◮ uint32_t streamId◮ uint32_t valueDetailed DescriptionThis activity record represents a memory set operation(<strong>CUPTI</strong>_ACTIVITY_KIND_MEMSET).Field Documentationuint64_t CUpti_ActivityMemset::bytesThe number of bytes being set by the memory set.uint32_t CUpti_ActivityMemset::contextIdThe ID of the context where the memory set is occurring.uint32_t CUpti_ActivityMemset::correlationIdThe correlation ID of the memory set. Each memory set is assigned a unique correlationID that is identical to the correlation ID in the driver API activity record that launchedCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 58


the memory set.uint32_t CUpti_ActivityMemset::deviceIdThe ID of the device where the memory set is occurring.uint64_t CUpti_ActivityMemset::endThe end timestamp for the memory set, in ns.CUpti_ActivityKind CUpti_ActivityMemset::kindThe activity record kind, must be <strong>CUPTI</strong>_ACTIVITY_KIND_MEMSET.void∗ CUpti_ActivityMemset::reserved0Undefined. Reserved for internal use.uint32_t CUpti_ActivityMemset::runtimeCorrelationIdThe runtime correlation ID of the memory set. Each memory set is assigned a uniqueruntime correlation ID that is identical to the correlation ID in the runtime API activityrecord that launched the memory set.uint64_t CUpti_ActivityMemset::startThe start timestamp for the memory set, in ns.uint32_t CUpti_ActivityMemset::streamIdThe ID of the stream where the memory set is occurring.uint32_t CUpti_ActivityMemset::valueThe value being assigned to memory by the memory set.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 59


CUpti_ActivityMetric Type ReferenceThe activity record for a <strong>CUPTI</strong> metric.Data Fields◮ uint32_t correlationId◮ CUpti_MetricID id◮ CUpti_ActivityKind kind◮ uint32_t pad◮ CUpti_MetricValue valueDetailed DescriptionThis activity record represents the collection of a <strong>CUPTI</strong> metric value(<strong>CUPTI</strong>_ACTIVITY_KIND_METRIC). This activity record kind is not produced bythe activity API but is included for completeness and ease-of-use. Profile frameworks builton top of <strong>CUPTI</strong> that collect metric data may choose to use this type to store thecollected metric data.Field Documentationuint32_t CUpti_ActivityMetric::correlationIdThe correlation ID of the metric. Use of this ID is user-defined, but typically this ID valuewill equal the correlation ID of the kernel for which the metric was gathered.CUpti_MetricID CUpti_ActivityMetric::idThe metric ID.CUpti_ActivityKind CUpti_ActivityMetric::kindThe activity record kind, must be <strong>CUPTI</strong>_ACTIVITY_KIND_METRIC.uint32_t CUpti_ActivityMetric::padUndefined. Reserved for internal use.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 60


CUpti_MetricValue CUpti_ActivityMetric::valueThe metric value.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 61


<strong>CUPTI</strong> Callback APIData Structures◮ struct CUpti_CallbackDataData passed into a runtime or driver API callback function.◮ struct CUpti_ResourceDataData passed into a resource callback function.◮ struct CUpti_SynchronizeDataData passed into a synchronize callback function.Typedefs◮ typedef void(∗ CUpti_CallbackFunc )(void ∗userdata, CUpti_CallbackDomaindomain, CUpti_CallbackId cbid, const void ∗cbdata)Function type for a callback.◮ typedef uint32_t CUpti_CallbackIdAn ID for a driver API, runtime API, resource or synchronization callback.◮ typedef CUpti_CallbackDomain ∗ CUpti_DomainTablePointer to an array of callback domains.◮ typedef struct CUpti_Subscriber_st ∗ CUpti_SubscriberHandleA callback subscriber.Enumerations◮ enum CUpti_ApiCallbackSite {<strong>CUPTI</strong>_API_ENTER = 0,<strong>CUPTI</strong>_API_EXIT = 1 }Specifies the point in an API call that a callback is issued.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 62


◮ enum CUpti_CallbackDomain {<strong>CUPTI</strong>_CB_DOMAIN_INVALID = 0,<strong>CUPTI</strong>_CB_DOMAIN_DRIVER_API = 1,<strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API = 2,<strong>CUPTI</strong>_CB_DOMAIN_RESOURCE = 3,<strong>CUPTI</strong>_CB_DOMAIN_SYNCHRONIZE = 4 }Callback domains.◮ enum CUpti_CallbackIdResource {<strong>CUPTI</strong>_CBID_RESOURCE_INVALID = 0,<strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_CREATED = 1,<strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_DESTROY_STARTING = 2,<strong>CUPTI</strong>_CBID_RESOURCE_STREAM_CREATED = 3,<strong>CUPTI</strong>_CBID_RESOURCE_STREAM_DESTROY_STARTING = 4 }Callback IDs for resource domain.◮ enum CUpti_CallbackIdSync {<strong>CUPTI</strong>_CBID_SYNCHRONIZE_INVALID = 0,<strong>CUPTI</strong>_CBID_SYNCHRONIZE_STREAM_SYNCHRONIZED = 1,<strong>CUPTI</strong>_CBID_SYNCHRONIZE_CONTEXT_SYNCHRONIZED = 2 }Callback IDs for synchronization domain.Functions◮ CUptiResult cuptiEnableAllDomains (uint32_t enable, CUpti_SubscriberHandlesubscriber)Enable or disable all callbacks in all domains.◮ CUptiResult cuptiEnableCallback (uint32_t enable, CUpti_SubscriberHandlesubscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid)Enable or disabled callbacks for a specific domain and callback ID.◮ CUptiResult cuptiEnableDomain (uint32_t enable, CUpti_SubscriberHandlesubscriber, CUpti_CallbackDomain domain)CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 63


Enable or disabled all callbacks for a specific domain.◮ CUptiResult cuptiGetCallbackState (uint32_t ∗enable, CUpti_SubscriberHandlesubscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid)Get the current enabled/disabled state of a callback for a specific domain and function ID.◮ CUptiResult cuptiSubscribe (CUpti_SubscriberHandle ∗subscriber,CUpti_CallbackFunc callback, void ∗userdata)Initialize a callback subscriber with a callback function and user data.◮ CUptiResult cuptiSupportedDomains (size_t ∗domainCount, CUpti_DomainTable∗domainTable)Get the available callback domains.◮ CUptiResult cuptiUnsubscribe (CUpti_SubscriberHandle subscriber)Unregister a callback subscriber.Typedef Documentationtypedef void( ∗ CUpti_CallbackFunc)(void ∗userdata,CUpti_CallbackDomain domain, CUpti_CallbackId cbid, constvoid ∗cbdata)Function type for a callback. The type of the data passed to the callback in cbdatadepends on the domain. If domain is <strong>CUPTI</strong>_CB_DOMAIN_DRIVER_API or<strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API the type of cbdata will beCUpti_CallbackData. If domain is <strong>CUPTI</strong>_CB_DOMAIN_RESOURCE the type ofcbdata will be CUpti_ResourceData. If domain is<strong>CUPTI</strong>_CB_DOMAIN_SYNCHRONIZE the type of cbdata will beCUpti_SynchronizeData.Parameters:userdata User data supplied at subscription of the callbackdomain The domain of the callbackcbid The ID of the callbackcbdata Data passed to the callback.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 64


typedef uint32_t CUpti_CallbackIdAn ID for a driver API, runtime API, resource or synchronization callback. Within adriver API callback this should be interpreted as a CUpti_driver_api_trace_cbid value.Within a runtime API callback this should be interpreted as aCUpti_runtime_api_trace_cbid value. Within a resource API callback this should beinterpreted as a CUpti_CallbackIdResource value. Within a synchronize API callback thisshould be interpreted as a CUpti_CallbackIdSync value.Enumeration Type Documentationenum CUpti_ApiCallbackSiteSpecifies the point in an API call that a callback is issued. This value is communicated tothe callback function via CUpti_CallbackData::callbackSite.Enumerator:<strong>CUPTI</strong>_API_ENTER<strong>CUPTI</strong>_API_EXITenum CUpti_CallbackDomainThe callback is at the entry of the API call.The callback is at the exit of the API call.Callback domains. Each domain represents callback points for a group of related APIfunctions or CUDA driver activity.Enumerator:<strong>CUPTI</strong>_CB_DOMAIN_INVALID<strong>CUPTI</strong>_CB_DOMAIN_DRIVER_APIdriver API functions.<strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_APIall runtime API functions.<strong>CUPTI</strong>_CB_DOMAIN_RESOURCECUDA resource tracking.<strong>CUPTI</strong>_CB_DOMAIN_SYNCHRONIZECUDA synchronization.enum CUpti_CallbackIdResourceInvalid domain.Domain containing callback points for allDomain containing callback points forDomain containing callback points forDomain containing callback points forCallback IDs for resource domain, <strong>CUPTI</strong>_CB_DOMAIN_RESOURCE. This value iscommunicated to the callback function via the cbid parameter.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 65


Enumerator:<strong>CUPTI</strong>_CBID_RESOURCE_INVALID<strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_CREATEDcreated.Invalid resource callback ID.A new context has been<strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_DESTROY_STARTINGabout to be destroyed.<strong>CUPTI</strong>_CBID_RESOURCE_STREAM_CREATEDcreated.A context isA new stream has been<strong>CUPTI</strong>_CBID_RESOURCE_STREAM_DESTROY_STARTINGabout to be destroyed.enum CUpti_CallbackIdSyncA stream isCallback IDs for synchronization domain, <strong>CUPTI</strong>_CB_DOMAIN_SYNCHRONIZE. Thisvalue is communicated to the callback function via the cbid parameter.Enumerator:<strong>CUPTI</strong>_CBID_SYNCHRONIZE_INVALIDInvalid synchronize callback ID.<strong>CUPTI</strong>_CBID_SYNCHRONIZE_STREAM_SYNCHRONIZEDsynchronization has completed for the stream.<strong>CUPTI</strong>_CBID_SYNCHRONIZE_CONTEXT_SYNCHRONIZEDsynchronization has completed for the context.Function DocumentationCUptiResult cuptiEnableAllDomains (uint32_t enable,CUpti_SubscriberHandle subscriber)Enable or disable all callbacks in all domains.Note:StreamContextThread-safety: a subscriber must serialize access to cuptiGetCallbackState,cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example,if cuptiGetCallbackState(sub, d, ∗) and cuptiEnableAllDomains(sub) are calledconcurrently, the results are undefined.Parameters:enable New enable state for all callbacks in all domain. Zero disables all callbacks,non-zero enables all callbacks.subscriber - Handle to callback subscriptionCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 66


Return values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if unable to initialized <strong>CUPTI</strong><strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if subscriber is invalidCUptiResult cuptiEnableCallback (uint32_t enable,CUpti_SubscriberHandle subscriber, CUpti_CallbackDomaindomain, CUpti_CallbackId cbid)Enable or disabled callbacks for a subscriber for a specific domain and callback ID.Note:Thread-safety: a subscriber must serialize access to cuptiGetCallbackState,cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example,if cuptiGetCallbackState(sub, d, c) and cuptiEnableCallback(sub, d, c) are calledconcurrently, the results are undefined.Parameters:enable New enable state for the callback. Zero disables the callback, non-zero enablesthe callback.subscriber - Handle to callback subscriptiondomain The domain of the callbackcbid The ID of the callbackReturn values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if unable to initialized <strong>CUPTI</strong><strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if subscriber, domain or cbid isinvalid.CUptiResult cuptiEnableDomain (uint32_t enable,CUpti_SubscriberHandle subscriber, CUpti_CallbackDomaindomain)Enable or disabled all callbacks for a specific domain.Note:Thread-safety: a subscriber must serialize access to cuptiGetCallbackState,cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example,if cuptiGetCallbackEnabled(sub, d, ∗) and cuptiEnableDomain(sub, d) are calledconcurrently, the results are undefined.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 67


Parameters:enable New enable state for all callbacks in the domain. Zero disables all callbacks,non-zero enables all callbacks.subscriber - Handle to callback subscriptiondomain The domain of the callbackReturn values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if unable to initialized <strong>CUPTI</strong><strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if subscriber or domain is invalidCUptiResult cuptiGetCallbackState (uint32_t ∗ enable,CUpti_SubscriberHandle subscriber, CUpti_CallbackDomaindomain, CUpti_CallbackId cbid)Returns non-zero in ∗enable if the callback for a domain and callback ID is enabled, andzero if not enabled.Note:Thread-safety: a subscriber must serialize access to cuptiGetCallbackState,cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example,if cuptiGetCallbackState(sub, d, c) and cuptiEnableCallback(sub, d, c) are calledconcurrently, the results are undefined.Parameters:enable Returns non-zero if callback enabled, zero if not enabledsubscriber Handle to the initialize subscriberdomain The domain of the callbackcbid The ID of the callbackReturn values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if unable to initialized <strong>CUPTI</strong><strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if enabled is NULL, or if subscriber,domain or cbid is invalid.CUptiResult cuptiSubscribe (CUpti_SubscriberHandle ∗subscriber, CUpti_CallbackFunc callback, void ∗ userdata)Initializes a callback subscriber with a callback function and (optionally) a pointer to userdata. The returned subscriber handle can be used to enable and disable the callback forCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 68


specific domains and callback IDs.Note:Only a single subscriber can be registered at a time.This function does not enable any callbacks.Thread-safety: this function is thread safe.Parameters:subscriber Returns handle to initialize subscribercallback The callback functionuserdata A pointer to user data. This data will be passed to the callback function viathe userdata paramater.Return values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if unable to initialize <strong>CUPTI</strong><strong>CUPTI</strong>_ERROR_MAX_LIMIT_REACHED if there is already a <strong>CUPTI</strong> subscriber<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if subscriber is NULLCUptiResult cuptiSupportedDomains (size_t ∗ domainCount,CUpti_DomainTable ∗ domainTable)Returns in ∗domainTable an array of size ∗domainCount of all the available callbackdomains.Note:Thread-safety: this function is thread safe.Parameters:domainCount Returns number of callback domainsdomainTable Returns pointer to array of available callback domainsReturn values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if unable to initialize <strong>CUPTI</strong><strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if domainCount or domainTable areNULLCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 69


CUptiResult cuptiUnsubscribe (CUpti_SubscriberHandlesubscriber)Removes a callback subscriber so that no future callbacks will be issued to that subscriber.Note:Thread-safety: this function is thread safe.Parameters:subscriber Handle to the initialize subscriberReturn values:<strong>CUPTI</strong>_SUCCESS on success<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED if unable to initialized <strong>CUPTI</strong><strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if subscriber is NULL or notinitializedCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 70


CUpti_CallbackData Type ReferenceData passed into a runtime or driver API callback function.Data Fields◮ CUpti_ApiCallbackSite callbackSite◮ CUcontext context◮ uint32_t contextUid◮ uint64_t ∗ correlationData◮ uint32_t correlationId◮ const char ∗ functionName◮ const void ∗ functionParams◮ void ∗ functionReturnValue◮ const char ∗ symbolNameDetailed DescriptionData passed into a runtime or driver API callback function as the cbdata argument toCUpti_CallbackFunc. The cbdata will be this type for domain equal to<strong>CUPTI</strong>_CB_DOMAIN_DRIVER_API or <strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API.The callback data is valid only within the invocation of the callback function that is passedthe data. If you need to retain some data for use outside of the callback, you must make acopy of that data. For example, if you make a shallow copy of CUpti_CallbackData withina callback, you cannot dereference functionParams outside of that callback to access thefunction parameters. functionName is an exception: the string pointed to byfunctionName is a global constant and so may be accessed outside of the callback.Field DocumentationCUpti_ApiCallbackSite CUpti_CallbackData::callbackSitePoint in the runtime or driver function from where the callback was issued.CUcontext CUpti_CallbackData::contextDriver context current to the thread, or null if no context is current. This value can changefrom the entry to exit callback of a runtime API function if the runtime initializes aCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 71


context.uint32_t CUpti_CallbackData::contextUidUnique ID for the CUDA context associated with the thread. The UIDs are assignedsequentially as contexts are created and are unique within a process.uint64_t∗ CUpti_CallbackData::correlationDataPointer to data shared between the entry and exit callbacks of a given runtime or driveAPI function invocation. This field can be used to pass 64-bit values from the entrycallback to the corresponding exit callback.uint32_t CUpti_CallbackData::correlationIdThe activity record correlation ID for this callback. For a driver domain callback (i.e.domain <strong>CUPTI</strong>_CB_DOMAIN_DRIVER_API) this ID will equal the correlation ID inthe CUpti_ActivityAPI record corresponding to the CUDA driver function call. For aruntime domain callback (i.e. domain <strong>CUPTI</strong>_CB_DOMAIN_RUNTIME_API) this IDwill equal the correlation ID in the CUpti_ActivityAPI record corresponding to theCUDA runtime function call. Within the callback, this ID can be recorded to correlateuser data with the activity record. This field is new in 4.1.const char∗ CUpti_CallbackData::functionNameName of the runtime or driver API function which issued the callback. This string is aglobal constant and so may be accessed outside of the callback.const void∗ CUpti_CallbackData::functionParamsPointer to the arguments passed to the runtime or driver API call. Seegenerated_cuda_runtime_api_meta::h and generated_cuda_meta::h for structuredefinitions for the parameters for each runtime and driver API function.void∗ CUpti_CallbackData::functionReturnValuePointer to the return value of the runtime or driver API call. This field is only valid withinthe exit::<strong>CUPTI</strong>_API_EXIT callback. For a runtime API functionReturnValue pointsto a cudaError_t. For a driver API functionReturnValue points to a CUresult.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 72


const char∗ CUpti_CallbackData::symbolNameName of the symbol operated on by the runtime or driver API function which issued thecallback. This entry is valid only for the runtime cudaLaunch callback (i.e.<strong>CUPTI</strong>_RUNTIME_TRACE_CBID_cudaLaunch_v3020), where it returns the name ofthe kernel.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 73


CUpti_ResourceData Type ReferenceData passed into a resource callback function.Data Fields◮ CUcontext context◮ void ∗ resourceDescriptor◮ CUstream streamDetailed DescriptionData passed into a resource callback function as the cbdata argument toCUpti_CallbackFunc. The cbdata will be this type for domain equal to<strong>CUPTI</strong>_CB_DOMAIN_RESOURCE. The callback data is valid only within theinvocation of the callback function that is passed the data. If you need to retain some datafor use outside of the callback, you must make a copy of that data.Field DocumentationCUcontext CUpti_ResourceData::contextFor <strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_CREATED and<strong>CUPTI</strong>_CBID_RESOURCE_CONTEXT_DESTROY_STARTING, the context beingcreated or destroyed. For <strong>CUPTI</strong>_CBID_RESOURCE_STREAM_CREATED and<strong>CUPTI</strong>_CBID_RESOURCE_STREAM_DESTROY_STARTING, the contextcontaining the stream being created or destroyed.void∗ CUpti_ResourceData::resourceDescriptorReserved for future use.CUstream CUpti_ResourceData::streamFor <strong>CUPTI</strong>_CBID_RESOURCE_STREAM_CREATED and<strong>CUPTI</strong>_CBID_RESOURCE_STREAM_DESTROY_STARTING, the stream beingcreated or destroyed.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 74


CUpti_SynchronizeData Type ReferenceData passed into a synchronize callback function.Data Fields◮ CUcontext context◮ CUstream streamDetailed DescriptionData passed into a synchronize callback function as the cbdata argument toCUpti_CallbackFunc. The cbdata will be this type for domain equal to<strong>CUPTI</strong>_CB_DOMAIN_SYNCHRONIZE. The callback data is valid only within theinvocation of the callback function that is passed the data. If you need to retain some datafor use outside of the callback, you must make a copy of that data.Field DocumentationCUcontext CUpti_SynchronizeData::contextThe context of the stream being synchronized.CUstream CUpti_SynchronizeData::streamThe stream being synchronized.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 75


<strong>CUPTI</strong> Event APIData Structures◮ struct CUpti_EventGroupSetA set of event groups.◮ struct CUpti_EventGroupSetsA set of event group sets.Defines◮ #define <strong>CUPTI</strong>_EVENT_OVERFLOW ((uint64_t)0xFFFFFFFFFFFFFFFFULL)The overflow value for a <strong>CUPTI</strong> event.Typedefs◮ typedef uint32_t CUpti_EventDomainIDID for an event domain.◮ typedef void ∗ CUpti_EventGroupA group of events.◮ typedef uint32_t CUpti_EventIDID for an event.Enumerations◮ enum CUpti_DeviceAttribute {<strong>CUPTI</strong>_DEVICE_ATTR_MAX_EVENT_ID = 1,<strong>CUPTI</strong>_DEVICE_ATTR_MAX_EVENT_DOMAIN_ID = 2,<strong>CUPTI</strong>_DEVICE_ATTR_GLOBAL_MEMORY_BANDWIDTH = 3,CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 76


<strong>CUPTI</strong>_DEVICE_ATTR_INSTRUCTION_PER_CYCLE = 4,<strong>CUPTI</strong>_DEVICE_ATTR_INSTRUCTION_THROUGHPUT_SINGLE_PRECISION= 5 }Device attributes.◮ enum CUpti_EventAttribute {<strong>CUPTI</strong>_EVENT_ATTR_NAME = 0,<strong>CUPTI</strong>_EVENT_ATTR_SHORT_DESCRIPTION = 1,<strong>CUPTI</strong>_EVENT_ATTR_LONG_DESCRIPTION = 2,<strong>CUPTI</strong>_EVENT_ATTR_CATEGORY = 3 }Event attributes.◮ enum CUpti_EventCategory {<strong>CUPTI</strong>_EVENT_CATEGORY_INSTRUCTION = 0,<strong>CUPTI</strong>_EVENT_CATEGORY_MEMORY = 1,<strong>CUPTI</strong>_EVENT_CATEGORY_CACHE = 2,<strong>CUPTI</strong>_EVENT_CATEGORY_PROFILE_TRIGGER = 3 }An event category.◮ enum CUpti_EventCollectionMode {<strong>CUPTI</strong>_EVENT_COLLECTION_MODE_CONTINUOUS = 0,<strong>CUPTI</strong>_EVENT_COLLECTION_MODE_KERNEL = 1 }Event collection modes.◮ enum CUpti_EventDomainAttribute {<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_NAME = 0,<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_INSTANCE_COUNT = 1,<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT = 3 }Event domain attributes.◮ enum CUpti_EventGroupAttribute {<strong>CUPTI</strong>_EVENT_GROUP_ATTR_EVENT_DOMAIN_ID = 0,<strong>CUPTI</strong>_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES =1,<strong>CUPTI</strong>_EVENT_GROUP_ATTR_USER_DATA = 2,CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 77


<strong>CUPTI</strong>_EVENT_GROUP_ATTR_NUM_EVENTS = 3,<strong>CUPTI</strong>_EVENT_GROUP_ATTR_EVENTS = 4,<strong>CUPTI</strong>_EVENT_GROUP_ATTR_INSTANCE_COUNT = 5 }Event group attributes.◮ enum CUpti_ReadEventFlags { <strong>CUPTI</strong>_EVENT_READ_FLAG_NONE = 0 }Flags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents.Functions◮ CUptiResult cuptiDeviceEnumEventDomains (CUdevice device, size_t∗arraySizeBytes, CUpti_EventDomainID ∗domainArray)Get the event domains for a device.◮ CUptiResult cuptiDeviceGetAttribute (CUdevice device, CUpti_DeviceAttributeattrib, size_t ∗valueSize, void ∗value)Read a device attribute.◮ CUptiResult cuptiDeviceGetEventDomainAttribute (CUdevice device,CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t∗valueSize, void ∗value)Read an event domain attribute.◮ CUptiResult cuptiDeviceGetNumEventDomains (CUdevice device, uint32_t∗numDomains)Get the number of domains for a device.◮ CUptiResult cuptiDeviceGetTimestamp (CUcontext context, uint64_t ∗timestamp)Read a device timestamp.◮ CUptiResult cuptiEnumEventDomains (size_t ∗arraySizeBytes,CUpti_EventDomainID ∗domainArray)Get the event domains available on any device.◮ CUptiResult cuptiEventDomainEnumEvents (CUpti_EventDomainID eventDomain,size_t ∗arraySizeBytes, CUpti_EventID ∗eventArray)Get the events in a domain.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 78


◮ CUptiResult cuptiEventDomainGetAttribute (CUpti_EventDomainID eventDomain,CUpti_EventDomainAttribute attrib, size_t ∗valueSize, void ∗value)Read an event domain attribute.◮ CUptiResult cuptiEventDomainGetNumEvents (CUpti_EventDomainIDeventDomain, uint32_t ∗numEvents)Get number of events in a domain.◮ CUptiResult cuptiEventGetAttribute (CUpti_EventID event,CUpti_EventAttribute attrib, size_t ∗valueSize, void ∗value)Get an event attribute.◮ CUptiResult cuptiEventGetIdFromName (CUdevice device, const char ∗eventName,CUpti_EventID ∗event)Find an event by name.◮ CUptiResult cuptiEventGroupAddEvent (CUpti_EventGroup eventGroup,CUpti_EventID event)Add an event to an event group.◮ CUptiResult cuptiEventGroupCreate (CUcontext context, CUpti_EventGroup∗eventGroup, uint32_t flags)Create a new event group for a context.◮ CUptiResult cuptiEventGroupDestroy (CUpti_EventGroup eventGroup)Destroy an event group.◮ CUptiResult cuptiEventGroupDisable (CUpti_EventGroup eventGroup)Disable an event group.◮ CUptiResult cuptiEventGroupEnable (CUpti_EventGroup eventGroup)Enable an event group.◮ CUptiResult cuptiEventGroupGetAttribute (CUpti_EventGroup eventGroup,CUpti_EventGroupAttribute attrib, size_t ∗valueSize, void ∗value)Read an event group attribute.◮ CUptiResult cuptiEventGroupReadAllEvents (CUpti_EventGroup eventGroup,CUpti_ReadEventFlags flags, size_t ∗eventValueBufferSizeBytes, uint64_tCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 79


∗eventValueBuffer, size_t ∗eventIdArraySizeBytes, CUpti_EventID ∗eventIdArray,size_t ∗numEventIdsRead)Read the values for all the events in an event group.◮ CUptiResult cuptiEventGroupReadEvent (CUpti_EventGroup eventGroup,CUpti_ReadEventFlags flags, CUpti_EventID event, size_t∗eventValueBufferSizeBytes, uint64_t ∗eventValueBuffer)Read the value for an event in an event group.◮ CUptiResult cuptiEventGroupRemoveAllEvents (CUpti_EventGroup eventGroup)Remove all events from an event group.◮ CUptiResult cuptiEventGroupRemoveEvent (CUpti_EventGroup eventGroup,CUpti_EventID event)Remove an event from an event group.◮ CUptiResult cuptiEventGroupResetAllEvents (CUpti_EventGroup eventGroup)Zero all the event counts in an event group.◮ CUptiResult cuptiEventGroupSetAttribute (CUpti_EventGroup eventGroup,CUpti_EventGroupAttribute attrib, size_t valueSize, void ∗value)Write an event group attribute.◮ CUptiResult cuptiEventGroupSetsCreate (CUcontext context, size_teventIdArraySizeBytes, CUpti_EventID ∗eventIdArray, CUpti_EventGroupSets∗∗eventGroupPasses)For a set of events, get the grouping that indicates the number of passes and the eventgroups necessary to collect the events.◮ CUptiResult cuptiEventGroupSetsDestroy (CUpti_EventGroupSets∗eventGroupSets)Destroy a CUpti_EventGroupSets object.◮ CUptiResult cuptiGetNumEventDomains (uint32_t ∗numDomains)Get the number of event domains available on any device.◮ CUptiResult cuptiGetTimestamp (uint64_t ∗timestamp)Get the <strong>CUPTI</strong> timestamp.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 80


◮ CUptiResult cuptiSetEventCollectionMode (CUcontext context,CUpti_EventCollectionMode mode)Set the event collection mode.Define Documentation#define<strong>CUPTI</strong>_EVENT_OVERFLOW ((uint64_t)0xFFFFFFFFFFFFFFFFULL)The <strong>CUPTI</strong> event value that indicates an overflow.Typedef Documentationtypedef uint32_t CUpti_EventDomainIDID for an event domain. An event domain represents a group of related events. A devicemay have multiple instances of a domain, indicating that the device can simultaneouslyrecord multiple instances of each event within that domain.typedef void∗ CUpti_EventGroupAn event group is a collection of events that are managed together. All events in an eventgroup must belong to the same domain.typedef uint32_t CUpti_EventIDAn event represents a countable activity, action, or occurrence on the device.Enumeration Type Documentationenum CUpti_DeviceAttribute<strong>CUPTI</strong> device attributes. These attributes can be read using cuptiDeviceGetAttribute.Enumerator:<strong>CUPTI</strong>_DEVICE_ATTR_MAX_EVENT_IDValue is a uint32_t.Number of event IDs for a device.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 81


<strong>CUPTI</strong>_DEVICE_ATTR_MAX_EVENT_DOMAIN_IDdomain IDs for a device. Value is a uint32_t.<strong>CUPTI</strong>_DEVICE_ATTR_GLOBAL_MEMORY_BANDWIDTHmemory bandwidth in Kbytes/sec. Value is a uint64_t.<strong>CUPTI</strong>_DEVICE_ATTR_INSTRUCTION_PER_CYCLEinstructions per cycle (for Fermi). Value is a uint32_t.Number of eventGet globalGet theoretical<strong>CUPTI</strong>_DEVICE_ATTR_INSTRUCTION_THROUGHPUT_SINGLE_PRECISIONGet theoretical number of single precision instructions that can be executed persecond. Value is a uint64_t.enum CUpti_EventAttributeEvent attributes. These attributes can be read using cuptiEventGetAttribute.Enumerator:<strong>CUPTI</strong>_EVENT_ATTR_NAMEc-string.<strong>CUPTI</strong>_EVENT_ATTR_SHORT_DESCRIPTIONValue is a null terminated const c-string.<strong>CUPTI</strong>_EVENT_ATTR_LONG_DESCRIPTIONValue is a null terminated const c-string.<strong>CUPTI</strong>_EVENT_ATTR_CATEGORYCUpti_EventCategory.enum CUpti_EventCategoryEvent name. Value is a null terminated constShort description of event.Long description of event.Category of event. Value isEach event is assigned to a category that represents the general type of the event. Aevent’s category is accessed using cuptiEventGetAttribute and the<strong>CUPTI</strong>_EVENT_ATTR_CATEGORY attribute.Enumerator:<strong>CUPTI</strong>_EVENT_CATEGORY_INSTRUCTION<strong>CUPTI</strong>_EVENT_CATEGORY_MEMORY<strong>CUPTI</strong>_EVENT_CATEGORY_CACHEAn instruction related event.A memory related event.A cache related event.<strong>CUPTI</strong>_EVENT_CATEGORY_PROFILE_TRIGGERenum CUpti_EventCollectionModeA profile-trigger event.The event collection mode determines the period over which the events within the enabledevent groups will be collected.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 82


Enumerator:<strong>CUPTI</strong>_EVENT_COLLECTION_MODE_CONTINUOUS Events are collectedfor the entire duration between the cuptiEventGroupEnable andcuptiEventGroupDisable calls. This is the default mode.<strong>CUPTI</strong>_EVENT_COLLECTION_MODE_KERNEL Events are collected only forthe durations of kernel executions that occur between thecuptiEventGroupEnable and cuptiEventGroupDisable calls. Event collectionbegins when a kernel execution begins, and stops when kernel executioncompletes. If multiple kernel executions occur between thecuptiEventGroupEnable and cuptiEventGroupDisable calls then the event valuesmust be read after each kernel launch if those events need to be associated withthe specific kernel launch.enum CUpti_EventDomainAttributeEvent domain attributes. Except where noted, all the attributes can be read using eithercuptiDeviceGetEventDomainAttribute or cuptiEventDomainGetAttribute.Enumerator:<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_NAMEterminated const c-string.Event domain name. Value is a null<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_INSTANCE_COUNT Number of instancesof the domain for which event counts will be collected. The domain may haveadditional instances that cannot be profiled (see<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT). Can beread only with cuptiDeviceGetEventDomainAttribute. Value is a uint32_t.<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT Totalnumber of instances of the domain, including instances that cannot be profiled.Use <strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_INSTANCE_COUNT to get thenumber of instances that can be profiled. Can be read only withcuptiDeviceGetEventDomainAttribute. Value is a uint32_t.enum CUpti_EventGroupAttributeEvent group attributes. These attributes can be read using cuptiEventGroupGetAttribute.Attributes marked [rw] can also be written using cuptiEventGroupSetAttribute.Enumerator:<strong>CUPTI</strong>_EVENT_GROUP_ATTR_EVENT_DOMAIN_ID The domain to whichthe event group is bound. This attribute is set when the first event is added tothe group. Value is a CUpti_EventDomainID.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 83


<strong>CUPTI</strong>_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES[rw] Profile all the instances of the domain for this eventgroup. This feature canbe used to get load balancing across all instances of a domain. Value is aninteger.<strong>CUPTI</strong>_EVENT_GROUP_ATTR_USER_DATA<strong>CUPTI</strong>_EVENT_GROUP_ATTR_NUM_EVENTSgroup. Value is a uint32_t.[rw] Reserved for user data.Number of events in the<strong>CUPTI</strong>_EVENT_GROUP_ATTR_EVENTS Enumerates events in the group.Value is a pointer to buffer of size sizeof(CUpti_EventID) ∗ num_of_events inthe eventgroup. num_of_events can be queried using<strong>CUPTI</strong>_EVENT_GROUP_ATTR_NUM_EVENTS.<strong>CUPTI</strong>_EVENT_GROUP_ATTR_INSTANCE_COUNT Number of instances ofthe domain bound to this event group that will be counted. Value is a uint32_t.enum CUpti_ReadEventFlagsFlags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents.Enumerator:<strong>CUPTI</strong>_EVENT_READ_FLAG_NONEFunction DocumentationNo flags.CUptiResult cuptiDeviceEnumEventDomains (CUdevice device,size_t ∗ arraySizeBytes, CUpti_EventDomainID ∗ domainArray)Returns the event domains IDs in domainArray for a device. The size of the domainArraybuffer is given by ∗arraySizeBytes. The size of the domainArray buffer must be at leastnumdomains ∗ sizeof(CUpti_EventDomainID) or else all domains will not be returned. Thevalue returned in ∗arraySizeBytes contains the number of bytes returned in domainArray.Parameters:device The CUDA devicearraySizeBytes The size of domainArray in bytes, and returns the number of byteswritten to domainArraydomainArray Returns the IDs of the event domains for the deviceReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICECUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 84


<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if arraySizeBytes or domainArray areNULLCUptiResult cuptiDeviceGetAttribute (CUdevice device,CUpti_DeviceAttribute attrib, size_t ∗ valueSize, void ∗ value)Read a device attribute and return it in ∗value.Parameters:device The CUDA deviceattrib The attribute to readvalueSize Size of buffer pointed by the value, and returns the number of bytes writtento valuevalue Returns the value of the attributeReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICE<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or ifattrib is not a device attribute<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT For non-c-stringattribute values, indicates that the value buffer is too small to hold the attributevalue.CUptiResult cuptiDeviceGetEventDomainAttribute(CUdevice device, CUpti_EventDomainID eventDomain,CUpti_EventDomainAttribute attrib, size_t ∗ valueSize, void ∗value)Returns an event domain attribute in ∗value. The size of the value buffer is given by∗valueSize. The value returned in ∗valueSize contains the number of bytes returned invalue.If the attribute value is a c-string that is longer than ∗valueSize, then only the first∗valueSize characters will be returned and there will be no terminating null byte.Parameters:device The CUDA deviceeventDomain ID of the event domainattrib The event domain attribute to readCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 85


valueSize The size of the value buffer in bytes, and returns the number of byteswritten to valuevalue Returns the attribute’s valueReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICE<strong>CUPTI</strong>_ERROR_INVALID_EVENT_DOMAIN_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or ifattrib is not an event domain attribute<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT For non-c-stringattribute values, indicates that the value buffer is too small to hold the attributevalue.CUptiResult cuptiDeviceGetNumEventDomains (CUdevice device,uint32_t ∗ numDomains)Returns the number of domains in numDomains for a device.Parameters:device The CUDA devicenumDomains Returns the number of domainsReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICE<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if numDomains is NULLCUptiResult cuptiDeviceGetTimestamp (CUcontext context,uint64_t ∗ timestamp)Returns the device timestamp in ∗timestamp. The timestamp is reported in nanosecondsand indicates the time since the device was last reset.Parameters:context A context on the device from which to get the timestamptimestamp Returns the device timestampCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 86


Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_CONTEXT<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER is timestamp is NULLCUptiResult cuptiEnumEventDomains (size_t ∗ arraySizeBytes,CUpti_EventDomainID ∗ domainArray)Returns all the event domains available on any CUDA-capable device. Event domain IDsare returned in domainArray. The size of the domainArray buffer is given by∗arraySizeBytes. The size of the domainArray buffer must be at least numDomains ∗sizeof(CUpti_EventDomainID) or all domains will not be returned. The value returned in∗arraySizeBytes contains the number of bytes returned in domainArray.Parameters:arraySizeBytes The size of domainArray in bytes, and returns the number of byteswritten to domainArraydomainArray Returns all the event domainsReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if arraySizeBytes or domainArray areNULLCUptiResult cuptiEventDomainEnumEvents(CUpti_EventDomainID eventDomain, size_t ∗arraySizeBytes, CUpti_EventID ∗ eventArray)Returns the event IDs in eventArray for a domain. The size of the eventArray buffer isgiven by ∗arraySizeBytes. The size of the eventArray buffer must be at leastnumdomainevents ∗ sizeof(CUpti_EventID) or else all events will not be returned. Thevalue returned in ∗arraySizeBytes contains the number of bytes returned in eventArray.Parameters:eventDomain ID of the event domainarraySizeBytes The size of eventArray in bytes, and returns the number of byteswritten to eventArrayeventArray Returns the IDs of the events in the domainCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 87


Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_EVENT_DOMAIN_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if arraySizeBytes or eventArray areNULLCUptiResult cuptiEventDomainGetAttribute(CUpti_EventDomainID eventDomain,CUpti_EventDomainAttribute attrib, size_t ∗valueSize, void ∗ value)Returns an event domain attribute in ∗value. The size of the value buffer is given by∗valueSize. The value returned in ∗valueSize contains the number of bytes returned invalue.If the attribute value is a c-string that is longer than ∗valueSize, then only the first∗valueSize characters will be returned and there will be no terminating null byte.Parameters:eventDomain ID of the event domainattrib The event domain attribute to readvalueSize The size of the value buffer in bytes, and returns the number of byteswritten to valuevalue Returns the attribute’s valueReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_EVENT_DOMAIN_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or ifattrib is not an event domain attribute<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT For non-c-stringattribute values, indicates that the value buffer is too small to hold the attributevalue.CUptiResult cuptiEventDomainGetNumEvents(CUpti_EventDomainID eventDomain, uint32_t ∗ numEvents)Returns the number of events in numEvents for a domain.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 88


Parameters:eventDomain ID of the event domainnumEvents Returns the number of events in the domainReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_EVENT_DOMAIN_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if numEvents is NULLCUptiResult cuptiEventGetAttribute (CUpti_EventID event,CUpti_EventAttribute attrib, size_t ∗ valueSize, void ∗ value)Returns an event attribute in ∗value. The size of the value buffer is given by ∗valueSize.The value returned in ∗valueSize contains the number of bytes returned in value.If the attribute value is a c-string that is longer than ∗valueSize, then only the first∗valueSize characters will be returned and there will be no terminating null byte.Parameters:event ID of the eventattrib The event attribute to readvalueSize The size of the value buffer in bytes, and returns the number of byteswritten to valuevalue Returns the attribute’s valueReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_EVENT_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or ifattrib is not an event attribute<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT For non-c-stringattribute values, indicates that the value buffer is too small to hold the attributevalue.CUptiResult cuptiEventGetIdFromName (CUdevice device, constchar ∗ eventName, CUpti_EventID ∗ event)Find an event by name and return the event ID in ∗event.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 89


Parameters:device The CUDA deviceeventName The name of the event to findevent Returns the ID of the found event or undefined if unable to find the eventReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICE<strong>CUPTI</strong>_ERROR_INVALID_EVENT_NAME if unable to find an event with nameeventName. In this case ∗event is undefined<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventName or event are NULLCUptiResult cuptiEventGroupAddEvent (CUpti_EventGroupeventGroup, CUpti_EventID event)Add an event to an event group. The event add can fail for a number of reasons:◮ The event group is enabled◮ The event does not belong to the same event domain as the events that are alreadyin the event group◮ Device limitations on the events that can belong to the same group◮ The event group is fullParameters:eventGroup The event groupevent The event to add to the groupReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_EVENT_ID<strong>CUPTI</strong>_ERROR_OUT_OF_MEMORY<strong>CUPTI</strong>_ERROR_INVALID_OPERATION if eventGroup is enabled<strong>CUPTI</strong>_ERROR_NOT_COMPATIBLE if event belongs to a different eventdomain than the events already in eventGroup, or if a device limitation preventsevent from being collected at the same time as the events already in eventGroup<strong>CUPTI</strong>_ERROR_MAX_LIMIT_REACHED if eventGroup is full<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 90


CUptiResult cuptiEventGroupCreate (CUcontext context,CUpti_EventGroup ∗ eventGroup, uint32_t flags)Creates a new event group for context and returns the new group in ∗eventGroup.Note:flags are reserved for future use and should be set to zero.Parameters:context The context for the event groupeventGroup Returns the new event groupflags Reserved - must be zeroReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_CONTEXT<strong>CUPTI</strong>_ERROR_OUT_OF_MEMORY<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUptiResult cuptiEventGroupDestroy (CUpti_EventGroupeventGroup)Destroy an eventGroup and free its resources. An event group cannot be destroyed if it isenabled.Parameters:eventGroup The event group to destroyReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_OPERATION if the event group is enabled<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUptiResult cuptiEventGroupDisable (CUpti_EventGroupeventGroup)Disable an event group. Disabling an event group stops collection of events contained inthe group.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 91


Parameters:eventGroup The event groupReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_HARDWARE<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUptiResult cuptiEventGroupEnable (CUpti_EventGroupeventGroup)Enable an event group. Enabling an event group zeros the value of all the events in thegroup and then starts collection of those events.Parameters:eventGroup The event groupReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_HARDWARE<strong>CUPTI</strong>_ERROR_NOT_READY if eventGroup does not contain any events<strong>CUPTI</strong>_ERROR_NOT_COMPATIBLE if eventGroup cannot be enabled due toother already enabled event groups<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUptiResult cuptiEventGroupGetAttribute (CUpti_EventGroupeventGroup, CUpti_EventGroupAttribute attrib, size_t ∗valueSize, void ∗ value)Read an event group attribute and return it in ∗value.Parameters:eventGroup The event groupattrib The attribute to readvalueSize Size of buffer pointed by the value, and returns the number of bytes writtento valuevalue Returns the value of the attributeCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 92


Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or ifattrib is not an eventgroup attribute<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT For non-c-stringattribute values, indicates that the value buffer is too small to hold the attributevalue.CUptiResult cuptiEventGroupReadAllEvents (CUpti_EventGroupeventGroup, CUpti_ReadEventFlags flags, size_t ∗eventValueBufferSizeBytes, uint64_t ∗ eventValueBuffer, size_t ∗eventIdArraySizeBytes, CUpti_EventID ∗ eventIdArray, size_t ∗numEventIdsRead)Read the values for all the events in an event group. The event values are returned in theeventValueBuffer buffer. eventValueBufferSizeBytes indicates the size ofeventValueBuffer. The buffer must be at least (sizeof(uint64) ∗ number of events ingroup) if <strong>CUPTI</strong>_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCESis not set on the group containing the events. The buffer must be at least (sizeof(uint64) ∗number of domain instances ∗ number of events in group) if<strong>CUPTI</strong>_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is set onthe group.The data format returned in eventValueBuffer is:◮ domain instance 0: event0 event1 ... eventN◮ domain instance 1: event0 event1 ... eventN◮ ...◮ domain instance M: event0 event1 ... eventNThe event order in eventValueBuffer is returned in eventIdArray. The size ofeventIdArray is specified in eventIdArraySizeBytes. The size should be at least(sizeof(CUpti_EventID) ∗ number of events in group).If any instance of any event counter overflows, the value returned for that event instancewill be <strong>CUPTI</strong>_EVENT_OVERFLOW.The only allowed value for flags is <strong>CUPTI</strong>_EVENT_READ_FLAG_NONE.Reading events from a disabled event group is not allowed.Parameters:eventGroup The event groupCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 93


flags Flags controlling the reading modeeventValueBufferSizeBytes The size of eventValueBuffer in bytes, and returns thenumber of bytes written to eventValueBuffereventValueBuffer Returns the event valueseventIdArraySizeBytes The size of eventIdArray in bytes, and returns the number ofbytes written to eventIdArrayeventIdArray Returns the IDs of the events in the same order as the values return ineventValueBuffer.numEventIdsRead Returns the number of event IDs returned inReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_HARDWARE<strong>CUPTI</strong>_ERROR_INVALID_OPERATION if eventGroup is disabled<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup,eventValueBufferSizeBytes, eventValueBuffer, eventIdArraySizeBytes,eventIdArray or numEventIdsRead is NULLCUptiResult cuptiEventGroupReadEvent (CUpti_EventGroupeventGroup, CUpti_ReadEventFlags flags, CUpti_EventIDevent, size_t ∗ eventValueBufferSizeBytes, uint64_t ∗eventValueBuffer)Read the value for an event in an event group. The event value is returned in theeventValueBuffer buffer. eventValueBufferSizeBytes indicates the size of theeventValueBuffer buffer. The buffer must be at least sizeof(uint64) if<strong>CUPTI</strong>_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is not seton the group containing the event. The buffer must be at least (sizeof(uint64) ∗ number ofdomain instances) if<strong>CUPTI</strong>_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is set onthe group.If any instance of an event counter overflows, the value returned for that event instancewill be <strong>CUPTI</strong>_EVENT_OVERFLOW.The only allowed value for flags is <strong>CUPTI</strong>_EVENT_READ_FLAG_NONE.Reading an event from a disabled event group is not allowed.Parameters:eventGroup The event groupCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 94


flags Flags controlling the reading modeevent The event to readeventValueBbufferSizeBytes The size of eventValueBuffer in bytes, and returns thenumber of bytes written to eventValueBuffereventValueBuffer Returns the event value(s)Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_EVENT_ID<strong>CUPTI</strong>_ERROR_HARDWARE<strong>CUPTI</strong>_ERROR_INVALID_OPERATION if eventGroup is disabled<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup,eventValueBufferSizeBytes or eventValueBuffer is NULLCUptiResult cuptiEventGroupRemoveAllEvents(CUpti_EventGroup eventGroup)Remove all events from an event group. Events cannot be removed if the event group isenabled.Parameters:eventGroup The event groupReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_OPERATION if eventGroup is enabled<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUptiResult cuptiEventGroupRemoveEvent (CUpti_EventGroupeventGroup, CUpti_EventID event)Remove event from the an event group. The event cannot be removed if the event groupis enabled.Parameters:eventGroup The event groupevent The event to remove from the groupCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 95


Return values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_EVENT_ID<strong>CUPTI</strong>_ERROR_INVALID_OPERATION if eventGroup is enabled<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUptiResult cuptiEventGroupResetAllEvents (CUpti_EventGroupeventGroup)Zero all the event counts in an event group.Parameters:eventGroup The event groupReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_HARDWARE<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroup is NULLCUptiResult cuptiEventGroupSetAttribute (CUpti_EventGroupeventGroup, CUpti_EventGroupAttribute attrib, size_tvalueSize, void ∗ value)Write an event group attribute.Parameters:eventGroup The event groupattrib The attribute to writevalueSize The size, in bytes, of the valuevalue The attribute value to writeReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or ifattrib is not an event group attribute, or if attrib is not a writable attribute<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT Indicates that thevalue buffer is too small to hold the attribute value.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 96


CUptiResult cuptiEventGroupSetsCreate (CUcontext context,size_t eventIdArraySizeBytes, CUpti_EventID ∗ eventIdArray,CUpti_EventGroupSets ∗∗ eventGroupPasses)The number of events that can be collected simultaneously varies by device and by thetype of the events. When events can be collected simultaneously, they may need to begrouped into multiple event groups because they are from different event domains. Thisfunction takes a set of events and determines how many passes are required to collect allthose events, and which events can be collected simultaneously in each pass.The CUpti_EventGroupSets returned in eventGroupPasses indicates how many passesare required to collect the events with the numSets field. The sets array indicates theevent groups that should be collected on each pass.Parameters:context The context for event collectioneventIdArraySizeBytes Size of eventIdArray in byteseventIdArray Array of event IDs that need to be groupedeventGroupPasses Returns a CUpti_EventGroupSets object that indicates thenumber of passes required to collect the events and the events to collect on eachpassReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_CONTEXT<strong>CUPTI</strong>_ERROR_INVALID_EVENT_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventIdArray or eventGroupPassesis NULLCUptiResult cuptiEventGroupSetsDestroy(CUpti_EventGroupSets ∗ eventGroupSets)Destroy a CUpti_EventGroupSets object.Parameters:eventGroupSets The object to destroyReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZEDCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 97


<strong>CUPTI</strong>_ERROR_INVALID_OPERATION if any of the event groups contained inthe sets is enabled<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventGroupSets is NULLCUptiResult cuptiGetNumEventDomains (uint32_t ∗ numDomains)Returns the total number of event domains available on any CUDA-capable device.Parameters:numDomains Returns the number of domainsReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if numDomains is NULLCUptiResult cuptiGetTimestamp (uint64_t ∗ timestamp)Returns a timestamp normalized to correspond with the start and end timestampsreported in the <strong>CUPTI</strong> activity records. The timestamp is reported in nanoseconds.Parameters:timestamp Returns the <strong>CUPTI</strong> timestampReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if timestamp is NULLCUptiResult cuptiSetEventCollectionMode (CUcontext context,CUpti_EventCollectionMode mode)Set the event collection mode for a context. The mode controls the event collectionbehavior of all events in event groups created in the context.Parameters:context The contextmode The event collection modeReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_CONTEXTCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 98


<strong>CUPTI</strong> Metric APIData Structures◮ union CUpti_MetricValueA metric value.Typedefs◮ typedef uint32_t CUpti_MetricIDID for a metric.Enumerations◮ enum CUpti_MetricAttribute {<strong>CUPTI</strong>_METRIC_ATTR_NAME = 0,<strong>CUPTI</strong>_METRIC_ATTR_SHORT_DESCRIPTION = 1,<strong>CUPTI</strong>_METRIC_ATTR_LONG_DESCRIPTION = 2,<strong>CUPTI</strong>_METRIC_ATTR_CATEGORY = 3,<strong>CUPTI</strong>_METRIC_ATTR_VALUE_KIND = 4 }Metric attributes.◮ enum CUpti_MetricCategory {<strong>CUPTI</strong>_METRIC_CATEGORY_MEMORY = 0,<strong>CUPTI</strong>_METRIC_CATEGORY_INSTRUCTION = 1,<strong>CUPTI</strong>_METRIC_CATEGORY_MULTIPROCESSOR = 2,<strong>CUPTI</strong>_METRIC_CATEGORY_CACHE = 3,<strong>CUPTI</strong>_METRIC_CATEGORY_TEXTURE = 4 }A metric category.◮ enum CUpti_MetricValueKind {<strong>CUPTI</strong>_METRIC_VALUE_KIND_DOUBLE = 0,<strong>CUPTI</strong>_METRIC_VALUE_KIND_UINT64 = 1,CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 99


<strong>CUPTI</strong>_METRIC_VALUE_KIND_PERCENT = 2,<strong>CUPTI</strong>_METRIC_VALUE_KIND_THROUGHPUT = 3 }Kinds of metric values.Functions◮ CUptiResult cuptiDeviceEnumMetrics (CUdevice device, size_t ∗arraySizeBytes,CUpti_MetricID ∗metricArray)Get the metrics for a device.◮ CUptiResult cuptiDeviceGetNumMetrics (CUdevice device, uint32_t ∗numMetrics)Get the number of metrics for a device.◮ CUptiResult cuptiEnumMetrics (size_t ∗arraySizeBytes, CUpti_MetricID∗metricArray)Get all the metrics available on any device.◮ CUptiResult cuptiGetNumMetrics (uint32_t ∗numMetrics)Get the total number of metrics available on any device.◮ CUptiResult cuptiMetricCreateEventGroupSets (CUcontext context, size_tmetricIdArraySizeBytes, CUpti_MetricID ∗metricIdArray, CUpti_EventGroupSets∗∗eventGroupPasses)For a set of metrics, get the grouping that indicates the number of passes and the eventgroups necessary to collect the events required for those metrics.◮ CUptiResult cuptiMetricEnumEvents (CUpti_MetricID metric, size_t∗eventIdArraySizeBytes, CUpti_EventID ∗eventIdArray)Get the events required to calculating a metric.◮ CUptiResult cuptiMetricGetAttribute (CUpti_MetricID metric,CUpti_MetricAttribute attrib, size_t ∗valueSize, void ∗value)Get a metric attribute.◮ CUptiResult cuptiMetricGetIdFromName (CUdevice device, const char∗metricName, CUpti_MetricID ∗metric)Find an metric by name.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 100


◮ CUptiResult cuptiMetricGetNumEvents (CUpti_MetricID metric, uint32_t∗numEvents)Get number of events required to calculate a metric.◮ CUptiResult cuptiMetricGetValue (CUdevice device, CUpti_MetricID metric, size_teventIdArraySizeBytes, CUpti_EventID ∗eventIdArray, size_teventValueArraySizeBytes, uint64_t ∗eventValueArray, uint64_t timeDuration,CUpti_MetricValue ∗metricValue)Calculate the value for a metric.Typedef Documentationtypedef uint32_t CUpti_MetricIDA metric provides a measure of some aspect of the device.Enumeration Type Documentationenum CUpti_MetricAttributeMetric attributes describe properties of a metric. These attributes can be read usingcuptiMetricGetAttribute.Enumerator:<strong>CUPTI</strong>_METRIC_ATTR_NAMEc-string.<strong>CUPTI</strong>_METRIC_ATTR_SHORT_DESCRIPTIONValue is a null terminated const c-string.<strong>CUPTI</strong>_METRIC_ATTR_LONG_DESCRIPTIONValue is a null terminated const c-string.<strong>CUPTI</strong>_METRIC_ATTR_CATEGORYCUpti_MetricCategory.<strong>CUPTI</strong>_METRIC_ATTR_VALUE_KINDtype CUpti_MetricValueKind.Metric name. Value is a null terminated constShort description of metric.Long description of metric.Category of the metric. Value is of typeValue type of the metric. Value is ofCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 101


enum CUpti_MetricCategoryEach metric is assigned to a category that represents the general type of the metric. Ametric’s category is accessed using cuptiMetricGetAttribute and the<strong>CUPTI</strong>_METRIC_ATTR_CATEGORY attribute.Enumerator:<strong>CUPTI</strong>_METRIC_CATEGORY_MEMORY<strong>CUPTI</strong>_METRIC_CATEGORY_INSTRUCTION<strong>CUPTI</strong>_METRIC_CATEGORY_MULTIPROCESSORmetric.<strong>CUPTI</strong>_METRIC_CATEGORY_CACHE<strong>CUPTI</strong>_METRIC_CATEGORY_TEXTUREenum CUpti_MetricValueKindA memory related metric.An instruction related metric.A cache related metric.A multiprocessor relatedA texture related metric.Metric values can be one of several different kinds. Corresponding to each kind is amember of the CUpti_MetricValue union. The metric value returned bycuptiMetricGetValue should be accessed using the appropriate member of that unionbased on its value kind.Enumerator:<strong>CUPTI</strong>_METRIC_VALUE_KIND_DOUBLEThe metric value is a 64-bit double.<strong>CUPTI</strong>_METRIC_VALUE_KIND_UINT64The metric value is a 64-bit integer.<strong>CUPTI</strong>_METRIC_VALUE_KIND_PERCENT The metric value is a percentagerepresented by a 64-bit double. For example, 57.5% is represented by the value57.5.<strong>CUPTI</strong>_METRIC_VALUE_KIND_THROUGHPUT The metric value is athroughput represented by a 64-bit integer. The unit for throughput values isbytes/second.Function DocumentationCUptiResult cuptiDeviceEnumMetrics (CUdevice device, size_t ∗arraySizeBytes, CUpti_MetricID ∗ metricArray)Returns the metric IDs in metricArray for a device. The size of the metricArray buffer isgiven by ∗arraySizeBytes. The size of the metricArray buffer must be at leastnumMetrics ∗ sizeof(CUpti_MetricID) or else all metric IDs will not be returned. Thevalue returned in ∗arraySizeBytes contains the number of bytes returned in metricArray.CUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 102


Parameters:device The CUDA devicearraySizeBytes The size of metricArray in bytes, and returns the number of byteswritten to metricArraymetricArray Returns the IDs of the metrics for the deviceReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICE<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if arraySizeBytes or metricArray areNULLCUptiResult cuptiDeviceGetNumMetrics (CUdevice device, uint32_t∗ numMetrics)Returns the number of metrics available for a device.Parameters:device The CUDA devicenumMetrics Returns the number of metrics available for the deviceReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICE<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if numMetrics is NULLCUptiResult cuptiEnumMetrics (size_t ∗ arraySizeBytes,CUpti_MetricID ∗ metricArray)Returns the metric IDs in metricArray for all CUDA-capable devices. The size of themetricArray buffer is given by ∗arraySizeBytes. The size of the metricArray buffermust be at least numMetrics ∗ sizeof(CUpti_MetricID) or all metric IDs will not bereturned. The value returned in ∗arraySizeBytes contains the number of bytes returnedin metricArray.Parameters:arraySizeBytes The size of metricArray in bytes, and returns the number of byteswritten to metricArrayCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 103


metricArray Returns the IDs of the metricsReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if arraySizeBytes or metricArray areNULLCUptiResult cuptiGetNumMetrics (uint32_t ∗ numMetrics)Returns the total number of metrics available on any CUDA-capable devices.Parameters:numMetrics Returns the number of metricsReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if numMetrics is NULLCUptiResult cuptiMetricCreateEventGroupSets (CUcontext context,size_t metricIdArraySizeBytes, CUpti_MetricID ∗ metricIdArray,CUpti_EventGroupSets ∗∗ eventGroupPasses)For a set of metrics, get the grouping that indicates the number of passes and the eventgroups necessary to collect the events required for those metrics.See also:cuptiEventGroupSetsCreate for details on event group set creation.Parameters:context The context for event collectionmetricIdArraySizeBytes Size of the metricIdArray in bytesmetricIdArray Array of metric IDseventGroupPasses Returns a CUpti_EventGroupSets object that indicates thenumber of passes required to collect the events and the events to collect on eachpassReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_CONTEXT<strong>CUPTI</strong>_ERROR_INVALID_METRIC_IDCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 104


<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if metricIdArray oreventGroupPasses is NULLCUptiResult cuptiMetricEnumEvents (CUpti_MetricID metric,size_t ∗ eventIdArraySizeBytes, CUpti_EventID ∗ eventIdArray)Gets the event IDs in eventIdArray required to calculate a metric. The size of theeventIdArray buffer is given by ∗eventIdArraySizeBytes and must be at leastnumEvents ∗ sizeof(CUpti_EventID) or all events will not be returned. The value returnedin ∗eventIdArraySizeBytes contains the number of bytes returned in eventIdArray.Parameters:metric ID of the metriceventIdArraySizeBytes The size of eventIdArray in bytes, and returns the number ofbytes written to eventIdArrayeventIdArray Returns the IDs of the events required to calculate metricReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_METRIC_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if eventIdArraySizeBytes oreventIdArray are NULL.CUptiResult cuptiMetricGetAttribute (CUpti_MetricID metric,CUpti_MetricAttribute attrib, size_t ∗ valueSize, void ∗ value)Returns a metric attribute in ∗value. The size of the value buffer is given by ∗valueSize.The value returned in ∗valueSize contains the number of bytes returned in value.If the attribute value is a c-string that is longer than ∗valueSize, then only the first∗valueSize characters will be returned and there will be no terminating null byte.Parameters:metric ID of the metricattrib The metric attribute to readvalueSize The size of the value buffer in bytes, and returns the number of byteswritten to valuevalue Returns the attribute’s valueReturn values:<strong>CUPTI</strong>_SUCCESSCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 105


<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_METRIC_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or ifattrib is not a metric attribute<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT For non-c-stringattribute values, indicates that the value buffer is too small to hold the attributevalue.CUptiResult cuptiMetricGetIdFromName (CUdevice device, constchar ∗ metricName, CUpti_MetricID ∗ metric)Find a metric by name and return the metric ID in ∗metric.Parameters:device The CUDA devicemetricName The name of metric to findmetric Returns the ID of the found metric or undefined if unable to find the metricReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_DEVICE<strong>CUPTI</strong>_ERROR_INVALID_METRIC_NAME if unable to find a metric with namemetricName. In this case ∗metric is undefined<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if metricName or metric are NULL.CUptiResult cuptiMetricGetNumEvents (CUpti_MetricID metric,uint32_t ∗ numEvents)Returns the number of events in numEvents that are required to calculate a metric.Parameters:metric ID of the metricnumEvents Returns the number of events required for the metricReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_METRIC_ID<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if numEvents is NULLCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 106


CUptiResult cuptiMetricGetValue (CUdevice device,CUpti_MetricID metric, size_t eventIdArraySizeBytes,CUpti_EventID ∗ eventIdArray, size_t eventValueArraySizeBytes,uint64_t ∗ eventValueArray, uint64_t timeDuration,CUpti_MetricValue ∗ metricValue)Use the events collected for a metric to calculate the metric value. Metric value calculationassumes that event values are normalized to represent all domain instances on a device.For the most accurate metric collection, the events required for the metric should becollected for all profiled domain instances. For example, to collect all instances of an event,set the <strong>CUPTI</strong>_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCESattribute on the group containing the event to 1. The normalized value for the event isthen: (sum_event_values ∗ totalInstanceCount) / instanceCount, wheresum_event_values is the summation of the event values across all profiled domaininstances, totalInstanceCount is obtained from querying<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT andinstanceCount is obtained from querying<strong>CUPTI</strong>_EVENT_GROUP_ATTR_INSTANCE_COUNT (or<strong>CUPTI</strong>_EVENT_DOMAIN_ATTR_INSTANCE_COUNT).Parameters:device The CUDA device that the metric is being calculated formetric The metric IDeventIdArraySizeBytes The size of eventIdArray in byteseventIdArray The event IDs required to calculate metriceventValueArraySizeBytes The size of eventValueArray in byteseventValueArray The normalized event values required to calculate metric. Thevalues must be order to match the order of events in eventIdArraytimeDuration The duration over which the events were collected, in nsmetricValue Returns the value for the metricReturn values:<strong>CUPTI</strong>_SUCCESS<strong>CUPTI</strong>_ERROR_NOT_INITIALIZED<strong>CUPTI</strong>_ERROR_INVALID_METRIC_ID<strong>CUPTI</strong>_ERROR_INVALID_OPERATION<strong>CUPTI</strong>_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT if the eventIdArraydoes not contain all the events needed for metric<strong>CUPTI</strong>_ERROR_INVALID_EVENT_VALUE if any of the event values requiredfor the metric is <strong>CUPTI</strong>_EVENT_OVERFLOWCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 107


<strong>CUPTI</strong>_ERROR_INVALID_PARAMETER if metricValue, eventIdArray oreventValueArray is NULLCUDA Tools SDK <strong>CUPTI</strong> User’s <strong>Guide</strong> DA-05679-001_v01 | 108


NoticeALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHERDOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NOWARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, ANDEXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FORA PARTICULAR PURPOSE.Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes noresponsibility for the consequences of use of such information or for any infringement of patents or otherrights of third parties that may result from its use. No license is granted by implication of otherwise underany patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to changewithout notice. This publication supersedes and replaces all other information previously supplied. NVIDIACorporation products are not authorized as critical components in life support devices or systems withoutexpress written approval of NVIDIA Corporation.TrademarksNVIDIA and the NVIDIA logo are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S.and other countries. Other company and product names may be trademarks of the respective companies withwhich they are associated.Copyright© 2011 NVIDIA Corporation. All rights reserved.www.nvidia.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!