Detecting MySQL IO problems on Linux at different ... - Percona
Detecting MySQL IO problems on Linux at different ... - Percona
Detecting MySQL IO problems on Linux at different ... - Percona
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<str<strong>on</strong>g>Detecting</str<strong>on</strong>g> <str<strong>on</strong>g>MySQL</str<strong>on</strong>g> <str<strong>on</strong>g>IO</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong><br />
<strong>Linux</strong> <strong>at</strong> <strong>different</strong> abstracti<strong>on</strong><br />
layers<br />
Nickolay Ihalainen<br />
Perc<strong>on</strong>a Live L<strong>on</strong>d<strong>on</strong> 2011
● D<strong>at</strong>aflow layers<br />
● OS tools<br />
Agenda<br />
● <str<strong>on</strong>g>MySQL</str<strong>on</strong>g> instrument<strong>at</strong>i<strong>on</strong><br />
● Inside InnoDB: story of <strong>on</strong>e insert<br />
www.perc<strong>on</strong>a.com
● Hardware Level<br />
Layers<br />
● On disk queue NCQ, TCQ and cache<br />
● RAID c<strong>on</strong>trollers queue, caches<br />
● OS Level<br />
● BLK device level<br />
● <str<strong>on</strong>g>IO</str<strong>on</strong>g> scheduler<br />
● File system<br />
● Page cache<br />
www.perc<strong>on</strong>a.com
● A<str<strong>on</strong>g>IO</str<strong>on</strong>g><br />
Software Layers<br />
● Kernel A<str<strong>on</strong>g>IO</str<strong>on</strong>g> (for win and <strong>Linux</strong> with O_DIRECT)<br />
● Glibc A<str<strong>on</strong>g>IO</str<strong>on</strong>g> (not used)<br />
● Normal read/pread and write/pwrite<br />
● Directories oper<strong>at</strong>i<strong>on</strong>s<br />
● <str<strong>on</strong>g>MySQL</str<strong>on</strong>g><br />
● D<strong>at</strong>a, InnoDb simul<strong>at</strong>ed A<str<strong>on</strong>g>IO</str<strong>on</strong>g><br />
● Dicti<strong>on</strong>aries (table cache and dicti<strong>on</strong>ary cache)<br />
www.perc<strong>on</strong>a.com
● vmst<strong>at</strong><br />
High-level tools<br />
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----<br />
r b swpd free buff cache si so bi bo in cs us sy id wa<br />
1 0 0 1216856 134096 1882012 0 0 0 26640 714 1215 3 2 88 7<br />
0 0 0 1216820 134096 1882360 0 0 0 0 501 697 2 0 98 0<br />
● iost<strong>at</strong><br />
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz<br />
await r_await w_await svctm %util<br />
sda 0.32 0.70 1.26 3.19 0.03 0.08 51.20 0.02<br />
4.22 1.77 5.19 0.42 0.19<br />
● pt-diskst<strong>at</strong>s<br />
www.perc<strong>on</strong>a.com
<str<strong>on</strong>g>MySQL</str<strong>on</strong>g> <str<strong>on</strong>g>IO</str<strong>on</strong>g> tracing fe<strong>at</strong>ures<br />
● Slow log, high verbosity with Perc<strong>on</strong>a Server<br />
● Innodb st<strong>at</strong>us, look for pending*<br />
● inform<strong>at</strong>i<strong>on</strong>_schema.innodb_io_p<strong>at</strong>tern<br />
● pt-mext<br />
www.perc<strong>on</strong>a.com
● PMP<br />
Debug <str<strong>on</strong>g>IO</str<strong>on</strong>g><br />
● strace, syscalls per sec and req per sec<br />
www.perc<strong>on</strong>a.com
Block level read load<br />
blktrace /dev/sdg -a issue -a complete -w 3600 -o - | blkiom<strong>on</strong> -I 10 -h -<br />
sizes read (bytes): num 3203, min 4096, max 262144, avg 20504.3<br />
d2c read (usec): num 3203, min 85, max 36013, avg 2982.8<br />
throughput read (bytes/msec): num 3203, min 147, max 265059, avg 71893.0<br />
sizes histogram (bytes):<br />
0: 0 1024: 0 2048: 0 4096: 472<br />
8192: 19 16384: 1719 32768: 929 65536: 30<br />
131072: 37 262144: 10 524288: 1 1048576: 0<br />
d2c histogram (usec):<br />
64: 0 128: 40 256: 1685 512: 322<br />
1024: 59 2048: 41 4096: 144 8192: 414<br />
16384: 459 32768: 52 65536: 1 131072: 0<br />
www.perc<strong>on</strong>a.com
Block level write load<br />
blktrace /dev/sdg -a issue -a complete -w 3600 -o - | blkiom<strong>on</strong> -I 10 -h -<br />
sizes write (bytes): num 1141, min 4096, max 294912, avg 6748.9<br />
d2c write (usec): num 1141, min 192, max 54462, avg 1835.4<br />
throughput write (bytes/msec): num 1141, min 75, max 175021, avg 25573.3<br />
sizes histogram (bytes):<br />
0: 0 1024: 0 2048: 0 4096: 771<br />
8192: 210 16384: 228 32768: 15 65536: 7<br />
131072: 8 262144: 10 524288: 1 1048576: 0<br />
d2c histogram (usec):<br />
64: 0 128: 2 256: 989 512: 87<br />
1024: 10 2048: 8 4096: 1 8192: 62<br />
16384: 25 32768: 57 65536: 9 131072: 0<br />
www.perc<strong>on</strong>a.com
File System level<br />
● perf st<strong>at</strong> -e 'ext4:*' -p `pgrep -x mysqld`<br />
1,303 ext4:ext4_mark_inode_dirty<br />
3,357 ext4:ext4_da_write_begin<br />
3,357 ext4:ext4_da_write_end<br />
634 ext4:ext4_da_writepages<br />
1,328 ext4:ext4_da_write_pages<br />
634 ext4:ext4_da_writepages_result<br />
634 ext4:ext4_sync_file_enter<br />
634 ext4:ext4_sync_file_exit<br />
10 ext4:ext4_da_reserve_space<br />
580 ext4:ext4_ext_map_blocks_enter<br />
580 ext4:ext4_ext_map_blocks_exit<br />
36.567472319 sec<strong>on</strong>ds time elapsed<br />
www.perc<strong>on</strong>a.com
File System level c<strong>on</strong>t.<br />
● systemtap, tools like disktop.stp, io_submit.stp,<br />
and more<br />
www.perc<strong>on</strong>a.com
Understand how mysql & innodb<br />
works<br />
● Use n<strong>on</strong>-stripped sources with -g enabled<br />
● GDB<br />
● Full strace -ff listing<br />
● systemtap with ubacktrace<br />
www.perc<strong>on</strong>a.com
● More RAM<br />
● Avoid randomness<br />
Avoid high <str<strong>on</strong>g>IO</str<strong>on</strong>g> load<br />
● Random primary keys<br />
● Useless upd<strong>at</strong>es/inserts<br />
● Redundant indexes and fields<br />
● Decrease number of rows fetched by using<br />
correct indexes<br />
● More disks in raid and/or SSD disks<br />
● Always test RAID adapters<br />
www.perc<strong>on</strong>a.com
Benchmarking<br />
● Bare-metal disk with dumb RAID<br />
● Raid adapter (stripe size and etc)<br />
● File system<br />
● Tools<br />
● dd, dbench<br />
● ioz<strong>on</strong>e<br />
● sysbench<br />
● Replaying general log/queries from tcpdump<br />
● Mysql replic<strong>at</strong>i<strong>on</strong> slave<br />
www.perc<strong>on</strong>a.com
● Prepar<strong>at</strong>i<strong>on</strong><br />
#!/usr/bin/env stap<br />
Story of <strong>on</strong>e INSERT<br />
# use: innodbio.stp --vp 00001 -d /usr/sbin/mysqld --ldd -x `pgrep -x mysqld` |<br />
c++filt|tee -a innodbio.log<br />
probe process("mysqld").functi<strong>on</strong>("os_file*") {<br />
printf("%d %d %s\n",gettimeofday_s(),tid(), probefunc() );<br />
print_ustack(ubacktrace()); printf("\n");<br />
}<br />
probe process("mysqld").functi<strong>on</strong>("os_aio*") {<br />
printf("%d %d %s\n",gettimeofday_s(),tid(), probefunc() );<br />
print_ustack(ubacktrace()); printf("\n");<br />
}<br />
www.perc<strong>on</strong>a.com
Story of <strong>on</strong>e INSERT<br />
● A l<strong>on</strong>g way to open table<br />
1319306314 7273 os_aio<br />
0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />
0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />
0x7f8bd70bc3f5 : buf_read_page_low+0xd5/0x210 [/usr/sbin/mysqld]<br />
0x7f8bd70bc755 : buf_read_page+0x225/0x3e0 [/usr/sbin/mysqld]<br />
0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />
0x7f8bd70a0d69 : btr_cur_search_to_nth_level+0x399/0xf40 [/usr/sbin/mysqld]<br />
0x7f8bd7178464 : btr_pcur_open_<strong>on</strong>_user_rec+0x64/0x2e0 [/usr/sbin/mysqld]<br />
0x7f8bd70c77b1 : dict_load_table +0x1f1/0x2450 [/usr/sbin/mysqld]<br />
0x7f8bd70c65a0 : dict_table_get+0x160/0x190 [/usr/sbin/mysqld]<br />
0x7f8bd709cd0d : ha_innobase::open(char c<strong>on</strong>st*, int, unsigned int)+0x1cd/0x6a0<br />
[/usr/sbin/mysqld]<br />
www.perc<strong>on</strong>a.com
Story of <strong>on</strong>e INSERT<br />
● A l<strong>on</strong>g way to open table<br />
1319306314 7273 os_aio<br />
0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />
0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />
0x7f8bd70bc3f5 : buf_read_page_low+0xd5/0x210 [/usr/sbin/mysqld]<br />
0x7f8bd70bc755 : buf_read_page+0x225/0x3e0 [/usr/sbin/mysqld]<br />
0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />
0x7f8bd70a0d69 : btr_cur_search_to_nth_level+0x399/0xf40 [/usr/sbin/mysqld]<br />
0x7f8bd7178464 : btr_pcur_open_<strong>on</strong>_user_rec+0x64/0x2e0 [/usr/sbin/mysqld]<br />
0x7f8bd70ca5ba : dict_load_indexes+0x26a/0x13e0 [/usr/sbin/mysqld]<br />
0x7f8bd70c82d4 : dict_load_table+0xd14/0x2450 [/usr/sbin/mysqld]<br />
0x7f8bd70c65a0 : dict_table_get+0x160/0x190 [/usr/sbin/mysqld]<br />
0x7f8bd709cd0d : ha_innobase::open(char c<strong>on</strong>st*, int, unsigned int)+0x1cd/0x6a0<br />
0x7f8bd6fde56d : handler::ha_open(st_table*, char c<strong>on</strong>st*, int, int)+0x3d/0x180<br />
www.perc<strong>on</strong>a.com
Story of <strong>on</strong>e INSERT<br />
● A l<strong>on</strong>g way to open table: st<strong>at</strong>istics<br />
1319306314 7273 os_file_cre<strong>at</strong>e_simple_no_error_handling<br />
0x7f8bd7111650 : os_file_cre<strong>at</strong>e_simple_no_error_handling+0x0/0x1a0<br />
[/usr/sbin/mysqld]<br />
0x7f8bd70cf8c0 : fil_node_open_file+0x170/0x440 [/usr/sbin/mysqld]<br />
0x7f8bd70cfbe9 : fil_node_prepare_for_io+0x59/0x190 [/usr/sbin/mysqld]<br />
0x7f8bd70d8986 : fil_space_get_size+0xe6/0x140 [/usr/sbin/mysqld]<br />
0x7f8bd70bc5ce : buf_read_page+0x9e/0x3e0 [/usr/sbin/mysqld]<br />
0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />
0x7f8bd7170c0f : btr_get_size+0x1af/0x330 [/usr/sbin/mysqld]<br />
0x7f8bd70c631b : dict_upd<strong>at</strong>e_st<strong>at</strong>istics_low +0x4b/0x170<br />
[/usr/sbin/mysqld]<br />
0x7f8bd70c6554 : dict_table_get+0x114/0x190 [/usr/sbin/mysqld]<br />
0x7f8bd709cd0d : ha_innobase::open(char c<strong>on</strong>st*, int, unsigned int)+0x1cd/0x6a0<br />
[/usr/sbin/mysqld]<br />
www.perc<strong>on</strong>a.com
Story of <strong>on</strong>e INSERT<br />
● St<strong>at</strong>istics is not finished:<br />
● os_file_pread<br />
● os_file_close<br />
● os_file_cre<strong>at</strong>e<br />
● os_file_lock<br />
● os_aio reads<br />
● Insert buffer oper<strong>at</strong>i<strong>on</strong>s: ibuf_merge_or_delete_for_page<br />
● Reserved pages fseg_n_reserved_pages<br />
www.perc<strong>on</strong>a.com
● Start transacti<strong>on</strong><br />
1319306314 7273 os_aio<br />
Story of <strong>on</strong>e INSERT<br />
0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />
0x7f8bd715f34b : trx_sys_flush_max_trx_id+0x9b/0xe0<br />
[/usr/sbin/mysqld]<br />
0x7f8bd7161e05 : trx_start_low+0xe5/0x1c0 [/usr/sbin/mysqld]<br />
0x7f8bd7161f4f : trx_start+0x6f/0xe0 [/usr/sbin/mysqld]<br />
● Read required pages before transacti<strong>on</strong> start<br />
● Save rowid in header file page<br />
dict_hdr_flush_row_id<br />
www.perc<strong>on</strong>a.com
● Commit<br />
Story of <strong>on</strong>e INSERT<br />
● XA two phase fix<strong>at</strong>i<strong>on</strong><br />
1319306314 7273 os_aio<br />
0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />
0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />
0x7f8bd70ffd6b : log_group_write_buf+0x1ab/0x370 [/usr/sbin/mysqld]<br />
0x7f8bd710037e : log_write_up_to+0x44e/0x7d0 [/usr/sbin/mysqld]<br />
0x7f8bd71644c9 : trx_prepare_off_kernel+0x2d9/0x330 [/usr/sbin/mysqld]<br />
0x7f8bd7164589 : trx_prepare_for_mysql+0x69/0x180 [/usr/sbin/mysqld]<br />
0x7f8bd709830b : innobase_xa_prepare(handlert<strong>on</strong>*, THD*, bool)+0xeb/0x150<br />
[/usr/sbin/mysqld]<br />
● And the same for normal logs<br />
www.perc<strong>on</strong>a.com
Story of <strong>on</strong>e INSERT<br />
● Periodic oper<strong>at</strong>i<strong>on</strong>s:<br />
● Write dirty pages<br />
1319306314 7273 os_aio<br />
1319306324 7270 os_aio<br />
0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />
0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />
0x7f8bd70b69ba : buf_flush_buffered_writes+0x26a/0x620<br />
[/usr/sbin/mysqld]<br />
0x7f8bd70b75f7 : buf_flush_b<strong>at</strong>ch+0x157/0x11f0 [/usr/sbin/mysqld]<br />
0x7f8bd714b666 : srv_master_thread+0xbe6/0xc60 [/usr/sbin/mysqld]<br />
● And log_checkpoint is called from the same<br />
thread<br />
www.perc<strong>on</strong>a.com
Thank You to Our Sp<strong>on</strong>sors<br />
Pl<strong>at</strong>inum Sp<strong>on</strong>sor<br />
Gold Sp<strong>on</strong>sor<br />
Silver Sp<strong>on</strong>sors<br />
www.perc<strong>on</strong>a.com
Perc<strong>on</strong>a Live L<strong>on</strong>d<strong>on</strong> Sp<strong>on</strong>sors<br />
Exhibitor Sp<strong>on</strong>sors<br />
Friends of Perc<strong>on</strong>a Sp<strong>on</strong>sors<br />
Media Sp<strong>on</strong>sors<br />
www.perc<strong>on</strong>a.com
Annual <str<strong>on</strong>g>MySQL</str<strong>on</strong>g> Users C<strong>on</strong>ference<br />
Presented by Perc<strong>on</strong>a Live<br />
The Hy<strong>at</strong>t Regency Hotel, Santa Clara, CA<br />
April 10th-12th, 2012<br />
Fe<strong>at</strong>ured Speakers<br />
Mark Callaghan, Facebook<br />
Jeremy Zawodny, Craigslist<br />
Marten Mickos, Eucalyptus Systems<br />
Sarah Novotny, Blue Gecko<br />
Peter Zaitsev, Perc<strong>on</strong>a<br />
Bar<strong>on</strong> Schwartz, Perc<strong>on</strong>a<br />
The Call for Papers is Now Open!<br />
Visit www.perc<strong>on</strong>a.com/live/mysql-c<strong>on</strong>ference-2012/<br />
www.perc<strong>on</strong>a.com
nickolay.ihalainen@perc<strong>on</strong>a.com<br />
We're Hiring! www.perc<strong>on</strong>a.com/about-us/careers/
www.perc<strong>on</strong>a.com/live