CUPTI User's Guide

More documents

Recommendations

Info

simplify the presentation error checking code has been removed.static void CUPTIAPIgetEventValueCallback ( void * userdata ,CUpti_CallbackDomain domain ,CUpti_CallbackId cbid ,const void * cbdata ){const CUpti_CallbackData * cbData =( CUpti_CallbackData *) cbdata ;if ( cbData -> callbackSite == CUPTI_API_ENTER ) {cudaThreadSynchronize ();cuptiSetEventCollectionMode ( cbInfo -> context ,CUPTI_EVENT_COLLECTION_MODE_KERNEL←↪);cuptiEventGroupEnable ( eventGroup );}if ( cbData -> callbackSite == CUPTI_API_EXIT ) {cudaThreadSynchronize ();cuptiEventGroupReadEvent ( eventGroup ,CUPTI_EVENT_READ_FLAG_ACCUMULATE ,eventId ,& bytesRead , & eventVal );}}cuptiEventGroupDisable ( eventGroup );Two synchronization points are used to ensure that events are counted only for theexecution of the kernel. If the application contains other threads that launch kernels, thenadditional thread-level synchronization must also be introduced to ensure that thosethreads do not launch kernels while the callback is collecting events. When thecudaLaunch API is entered (that is, before the kernel is actually launched on the device),cudaThreadSynchronize is used to wait until the GPU is idle. The event collection modeis set to CUPTI_EVENT_COLLECTION_MODE_KERNEL so that the event counters areautomatically started and stopped just before and after the kernel executes. Then eventcollection is enabled with cuptiEventGroupEnable.When the cudaLaunch API is exited (that is, after the kernel is queued for execution onthe GPU) another cudaThreadSynchronize is used to cause the CPU thread to wait forthe kernel to finish execution. Finally, the event counts are read withcuptiEventGroupReadEvent.CUDA Tools SDK CUPTI User’s Guide DA-05679-001_v01 | 10
Sampling EventsThe event API can also be used to sample event values while a kernel or kernels areexecuting (as demonstrated by the event_sampling sample). The sample shows onepossible way to perform the sampling. The event collection mode is set toCUPTI_EVENT_COLLECTION_MODE_CONTINUOUS so that the event counters run continuously.Two threads are used in event_sampling: one thread schedules the kernels and memcpysthat perform the computation, while another thread wakes periodically to sample an eventcounter. In this sample there is no correlation of the event samples with what is happeningon the GPU. To get some coarse correlation, you can use cuptiDeviceGetTimestamp tocollect the GPU timestamp at the time of the sample and also at other interesting pointsin your application.Interpreting Event ValuesThe tables below describe the events available for each device. Each event has a type thatindicates how the activity or action associated with that event is collected. The eventtypes are SM, TPC, and FB.SM Event TypeThe SM event type indicates that the event is collected for an action or activity thatoccurs on one or more of the device’s streaming multiprocessors (SMs). A streamingmultiprocessor creates, manages, schedules, and executes threads in groups of 32 threadscalled warps.The SM event values typically represent activity or action of thread warps, and not theactivity or action of individual threads. Details of how each event is incremented are givenin the event tables below.Two factors will impact the accuracy of the values collected for SM type events. First, dueto variations in system state, event values can vary across different, identical, runs of thesame application. Second, for devices with compute capability less than 2.0, SM events arecounted only for one SM. For devices with compute capability greater than 2.0, SM eventsfrom domain_d are counted for all SMs but for SM events from domain_a are countedfor multiple but not all, SMs. To get the most consistent results inspite of these factors, itis best to have number of blocks for each kernel launched to be a multiple of the totalnumber of SMs on a device. In other words, the grid configuration should be chosen suchthat the number of blocks launched on each SM is the same and also the amount of workof interest per block is the same.CUDA Tools SDK CUPTI User’s Guide DA-05679-001_v01 | 11
Page 1 and 2: CUDA Tools SDKCUPTI User’s GuideD
Page 3 and 4: CUPTIThe CUDA Profiling Tools Inter
Page 5 and 6: CUPTI_CBID_RESOURCE_CONTEXT_CREATED
Page 7 and 8: twice each time any of the CUDA run
Page 9: device-specific limits. At any give
Page 13: CapabilityEvent Name Description Ty
Page 16 and 17: CapabilityEvent Name Description Ty
Page 18 and 19: CapabilityEvent Name Description Ty
Page 20: CapabilityEvent Name Description Ty
Page 26 and 27: CUPTI ReferenceCUPTI VersionDefines
Page 28 and 29: CUPTI Result CodesEnumerations◮ e
Page 30 and 31: CUPTI_ERROR_INVALID_METRIC_IDCUPTI_
Page 32 and 33: CUPTI_ACTIVITY_COMPUTE_API_OPENCL =
Page 34 and 35: size_t ∗validBufferSizeBytes)Quer
Page 36 and 37: CUPTI_ACTIVITY_MEMCPY_KIND_HTOAcopy
Page 38 and 39: CUptiResult cuptiActivityEnqueueBuf
Page 40 and 41: CUptiResult cuptiActivityGetNumDrop
Page 42 and 43: CUpti_Activity Type ReferenceThe ba
Page 44 and 45: CUpti_ActivityKind CUpti_ActivityAP
Page 46 and 47: Detailed DescriptionThis activity r
Page 48 and 49: uint32_t CUpti_ActivityDevice::maxI
Page 50 and 51: uint64_t CUpti_ActivityEvent::value
Page 52 and 53: Field Documentationint32_t CUpti_Ac
Page 54 and 55: uint16_t CUpti_ActivityKernel::regi
Page 56 and 57: uint8_t CUpti_ActivityMemcpy::copyK
Page 58 and 59: CUpti_ActivityMemset Type Reference
Page 60 and 61:
CUpti_ActivityMetric Type Reference
Page 62 and 63:
CUPTI Callback APIData Structures
Page 64 and 65:
Enable or disabled all callbacks fo
Page 66 and 67:
Enumerator:CUPTI_CBID_RESOURCE_INVA
Page 68 and 69:
Parameters:enable New enable state
Page 70 and 71:
CUptiResult cuptiUnsubscribe (CUpti
Page 72 and 73:
context.uint32_t CUpti_CallbackData
Page 74 and 75:
CUpti_ResourceData Type ReferenceDa
Page 76 and 77:
CUPTI Event APIData Structures◮ s
Page 78 and 79:
CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS =
Page 80 and 81:
∗eventValueBuffer, size_t ∗even
Page 82 and 83:
CUPTI_DEVICE_ATTR_MAX_EVENT_DOMAIN_
Page 84 and 85:
CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_
Page 86 and 87:
valueSize The size of the value buf
Page 88 and 89:
Return values:CUPTI_SUCCESSCUPTI_ER
Page 90 and 91:
Parameters:device The CUDA deviceev
Page 92 and 93:
Parameters:eventGroup The event gro
Page 94 and 95:
flags Flags controlling the reading
Page 96 and 97:
Return values:CUPTI_SUCCESSCUPTI_ER
Page 98 and 99:
CUPTI_ERROR_INVALID_OPERATION if an
Page 100 and 101:
CUPTI_METRIC_VALUE_KIND_PERCENT = 2
Page 102 and 103:
enum CUpti_MetricCategoryEach metri
Page 104 and 105:
metricArray Returns the IDs of the
Page 106 and 107:
CUPTI_ERROR_NOT_INITIALIZEDCUPTI_ER
Page 108 and 109:
CUPTI_ERROR_INVALID_PARAMETER if me
show all

CUPTI User's Guide

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?