A CIL Tutorial - Department of Computer Science - ETH Zürich

A CIL Tutorial - Department of Computer Science - ETH Zürich A CIL Tutorial - Department of Computer Science - ETH Zürich

29.01.2014 Views

CHAPTER 10. ADDING A NEW KIND OF STATEMENT 84 ../ciltut-lib/src/tut10.c # define _GNU_SOURCE // Needed for RTLD_NEXT # include // for uint64_t # include // for pthread_create # include // for RTLD_NEXT # include # include # include # include We'll omit from the text most of the code for setting up the performance counters. You can nd it in ciltut libc.c. The important thing to note about this code is that perf get cache refs() returns the cumulative number of cache references performed by the calling thread, and perf get cache miss() returns the cumulative number of those references that missed the cache. For each thread we'll keep a stack of records that each record the cumulative number of cache misses (start miss), the cumulative number of cache references (start refs), and the starting time (start time) at the beginning of a cache report statement. struct cache_stack_entry { uint64_t start_miss; uint64_t start_refs; uint64_t start_time; }; ../ciltut-lib/src/tut10.c # define MAX_CACHE_STACK_ENTRIES 256 struct cache_stack { struct cache_stack_entry s[MAX_CACHE_STACK_ENTRIES]; int t; }; Next, we set up the thread local storage for the stack, and a pointer to the top of the stack. Linux supports gcc's thread storage modier, but other OS's might not, for example Mac OSX. Thus, we'll just stick to using pthread setspecific and pthread getspecific for implementing thread local storage. The CONSTRUCTOR attribute on init CS key ensures that the stack is initialized in the rst thread of the program. Next, we'll ensure that each new thread also calls init CS key.

CHAPTER 10. ADDING A NEW KIND OF STATEMENT 85 ../ciltut-lib/src/tut10.c pthread_key_t CS_key; static void init_CS() { struct cache_stack *CS = calloc(1, sizeof(*CS)); pthread_setspecific(CS_key, CS); } CONSTRUCTOR static void init_CS_key() { pthread_key_create(&CS_key, &free); init_CS(); } static struct cache_stack *get_CS() { return (struct cache_stack *)pthread_getspecific(CS_key); } The next three items demonstrate how to override the C Library's pthread create function. First, we set up a function pointer to store a reference to the original function (pthread create orig) we'll need to call it from inside of our version. Second, wrap the call to the dynamic linker in some error checking code in checked dlsym. Finally, we set up a constructor function, which runs before main, to call checked dlsym, which returns a pointer to the original pthread create call. ../ciltut-lib/src/tut10.c int (*pthread_create_orig)(pthread_t *__restrict, __const pthread_attr_t *__restrict, void *(*)(void *), void *__restrict) = NULL; extern void *checked_dlsym(void *handle, const char *sym); CONSTRUCTOR static void init_cache_stack() { pthread_create_orig = checked_dlsym(RTLD_NEXT, "pthread_create"); } The goal of wrapping pthread create is to ensure that spawned threads set up the state needed for the cache report statement. Therefore, we need to wrap every function passed to pthread create in a function that performs the initialization before calling passed function. We accomplish this by dening a structure type for the function pointer and its argument. Then, in tfunc wrapper, we initialize thread local storage for the cache stack, and initialize the performance counters for the new thread with perf init before calling the function pointer on its argument.

CHAPTER 10. ADDING A NEW KIND OF STATEMENT 84<br />

../ciltut-lib/src/tut10.c<br />

# define _GNU_SOURCE // Needed for RTLD_NEXT<br />

# include // for uint64_t<br />

# include // for pthread_create<br />

# include // for RTLD_NEXT<br />

# include <br />

# include <br />

# include <br />

# include <br />

We'll omit from the text most <strong>of</strong> the code for setting up the performance counters. You can nd it<br />

in ciltut libc.c. The important thing to note about this code is that perf get cache refs() returns<br />

the cumulative number <strong>of</strong> cache references performed by the calling thread, and perf get cache miss()<br />

returns the cumulative number <strong>of</strong> those references that missed the cache.<br />

For each thread we'll keep a stack <strong>of</strong> records that each record the cumulative number <strong>of</strong> cache<br />

misses (start miss), the cumulative number <strong>of</strong> cache references (start refs), and the starting<br />

time (start time) at the beginning <strong>of</strong> a cache report statement.<br />

struct cache_stack_entry {<br />

uint64_t start_miss;<br />

uint64_t start_refs;<br />

uint64_t start_time;<br />

};<br />

../ciltut-lib/src/tut10.c<br />

# define MAX_CACHE_STACK_ENTRIES 256<br />

struct cache_stack {<br />

struct cache_stack_entry s[MAX_CACHE_STACK_ENTRIES];<br />

int t;<br />

};<br />

Next, we set up the thread local storage for the stack, and a pointer to the top <strong>of</strong> the stack. Linux<br />

supports gcc's thread storage modier, but other OS's might not, for example Mac OSX. Thus,<br />

we'll just stick to using pthread setspecific and pthread getspecific for implementing thread<br />

local storage. The CONSTRUCTOR attribute on init CS key ensures that the stack is initialized in<br />

the rst thread <strong>of</strong> the program. Next, we'll ensure that each new thread also calls init CS key.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!