26.02.2013 Views

Detecting MySQL IO problems on Linux at different ... - Percona

Detecting MySQL IO problems on Linux at different ... - Percona

Detecting MySQL IO problems on Linux at different ... - Percona

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>Detecting</str<strong>on</strong>g> <str<strong>on</strong>g>MySQL</str<strong>on</strong>g> <str<strong>on</strong>g>IO</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong><br />

<strong>Linux</strong> <strong>at</strong> <strong>different</strong> abstracti<strong>on</strong><br />

layers<br />

Nickolay Ihalainen<br />

Perc<strong>on</strong>a Live L<strong>on</strong>d<strong>on</strong> 2011


● D<strong>at</strong>aflow layers<br />

● OS tools<br />

Agenda<br />

● <str<strong>on</strong>g>MySQL</str<strong>on</strong>g> instrument<strong>at</strong>i<strong>on</strong><br />

● Inside InnoDB: story of <strong>on</strong>e insert<br />

www.perc<strong>on</strong>a.com


● Hardware Level<br />

Layers<br />

● On disk queue NCQ, TCQ and cache<br />

● RAID c<strong>on</strong>trollers queue, caches<br />

● OS Level<br />

● BLK device level<br />

● <str<strong>on</strong>g>IO</str<strong>on</strong>g> scheduler<br />

● File system<br />

● Page cache<br />

www.perc<strong>on</strong>a.com


● A<str<strong>on</strong>g>IO</str<strong>on</strong>g><br />

Software Layers<br />

● Kernel A<str<strong>on</strong>g>IO</str<strong>on</strong>g> (for win and <strong>Linux</strong> with O_DIRECT)<br />

● Glibc A<str<strong>on</strong>g>IO</str<strong>on</strong>g> (not used)<br />

● Normal read/pread and write/pwrite<br />

● Directories oper<strong>at</strong>i<strong>on</strong>s<br />

● <str<strong>on</strong>g>MySQL</str<strong>on</strong>g><br />

● D<strong>at</strong>a, InnoDb simul<strong>at</strong>ed A<str<strong>on</strong>g>IO</str<strong>on</strong>g><br />

● Dicti<strong>on</strong>aries (table cache and dicti<strong>on</strong>ary cache)<br />

www.perc<strong>on</strong>a.com


● vmst<strong>at</strong><br />

High-level tools<br />

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----<br />

r b swpd free buff cache si so bi bo in cs us sy id wa<br />

1 0 0 1216856 134096 1882012 0 0 0 26640 714 1215 3 2 88 7<br />

0 0 0 1216820 134096 1882360 0 0 0 0 501 697 2 0 98 0<br />

● iost<strong>at</strong><br />

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz<br />

await r_await w_await svctm %util<br />

sda 0.32 0.70 1.26 3.19 0.03 0.08 51.20 0.02<br />

4.22 1.77 5.19 0.42 0.19<br />

● pt-diskst<strong>at</strong>s<br />

www.perc<strong>on</strong>a.com


<str<strong>on</strong>g>MySQL</str<strong>on</strong>g> <str<strong>on</strong>g>IO</str<strong>on</strong>g> tracing fe<strong>at</strong>ures<br />

● Slow log, high verbosity with Perc<strong>on</strong>a Server<br />

● Innodb st<strong>at</strong>us, look for pending*<br />

● inform<strong>at</strong>i<strong>on</strong>_schema.innodb_io_p<strong>at</strong>tern<br />

● pt-mext<br />

www.perc<strong>on</strong>a.com


● PMP<br />

Debug <str<strong>on</strong>g>IO</str<strong>on</strong>g><br />

● strace, syscalls per sec and req per sec<br />

www.perc<strong>on</strong>a.com


Block level read load<br />

blktrace /dev/sdg -a issue -a complete -w 3600 -o - | blkiom<strong>on</strong> -I 10 -h -<br />

sizes read (bytes): num 3203, min 4096, max 262144, avg 20504.3<br />

d2c read (usec): num 3203, min 85, max 36013, avg 2982.8<br />

throughput read (bytes/msec): num 3203, min 147, max 265059, avg 71893.0<br />

sizes histogram (bytes):<br />

0: 0 1024: 0 2048: 0 4096: 472<br />

8192: 19 16384: 1719 32768: 929 65536: 30<br />

131072: 37 262144: 10 524288: 1 1048576: 0<br />

d2c histogram (usec):<br />

64: 0 128: 40 256: 1685 512: 322<br />

1024: 59 2048: 41 4096: 144 8192: 414<br />

16384: 459 32768: 52 65536: 1 131072: 0<br />

www.perc<strong>on</strong>a.com


Block level write load<br />

blktrace /dev/sdg -a issue -a complete -w 3600 -o - | blkiom<strong>on</strong> -I 10 -h -<br />

sizes write (bytes): num 1141, min 4096, max 294912, avg 6748.9<br />

d2c write (usec): num 1141, min 192, max 54462, avg 1835.4<br />

throughput write (bytes/msec): num 1141, min 75, max 175021, avg 25573.3<br />

sizes histogram (bytes):<br />

0: 0 1024: 0 2048: 0 4096: 771<br />

8192: 210 16384: 228 32768: 15 65536: 7<br />

131072: 8 262144: 10 524288: 1 1048576: 0<br />

d2c histogram (usec):<br />

64: 0 128: 2 256: 989 512: 87<br />

1024: 10 2048: 8 4096: 1 8192: 62<br />

16384: 25 32768: 57 65536: 9 131072: 0<br />

www.perc<strong>on</strong>a.com


File System level<br />

● perf st<strong>at</strong> -e 'ext4:*' -p `pgrep -x mysqld`<br />

1,303 ext4:ext4_mark_inode_dirty<br />

3,357 ext4:ext4_da_write_begin<br />

3,357 ext4:ext4_da_write_end<br />

634 ext4:ext4_da_writepages<br />

1,328 ext4:ext4_da_write_pages<br />

634 ext4:ext4_da_writepages_result<br />

634 ext4:ext4_sync_file_enter<br />

634 ext4:ext4_sync_file_exit<br />

10 ext4:ext4_da_reserve_space<br />

580 ext4:ext4_ext_map_blocks_enter<br />

580 ext4:ext4_ext_map_blocks_exit<br />

36.567472319 sec<strong>on</strong>ds time elapsed<br />

www.perc<strong>on</strong>a.com


File System level c<strong>on</strong>t.<br />

● systemtap, tools like disktop.stp, io_submit.stp,<br />

and more<br />

www.perc<strong>on</strong>a.com


Understand how mysql & innodb<br />

works<br />

● Use n<strong>on</strong>-stripped sources with -g enabled<br />

● GDB<br />

● Full strace -ff listing<br />

● systemtap with ubacktrace<br />

www.perc<strong>on</strong>a.com


● More RAM<br />

● Avoid randomness<br />

Avoid high <str<strong>on</strong>g>IO</str<strong>on</strong>g> load<br />

● Random primary keys<br />

● Useless upd<strong>at</strong>es/inserts<br />

● Redundant indexes and fields<br />

● Decrease number of rows fetched by using<br />

correct indexes<br />

● More disks in raid and/or SSD disks<br />

● Always test RAID adapters<br />

www.perc<strong>on</strong>a.com


Benchmarking<br />

● Bare-metal disk with dumb RAID<br />

● Raid adapter (stripe size and etc)<br />

● File system<br />

● Tools<br />

● dd, dbench<br />

● ioz<strong>on</strong>e<br />

● sysbench<br />

● Replaying general log/queries from tcpdump<br />

● Mysql replic<strong>at</strong>i<strong>on</strong> slave<br />

www.perc<strong>on</strong>a.com


● Prepar<strong>at</strong>i<strong>on</strong><br />

#!/usr/bin/env stap<br />

Story of <strong>on</strong>e INSERT<br />

# use: innodbio.stp --vp 00001 -d /usr/sbin/mysqld --ldd -x `pgrep -x mysqld` |<br />

c++filt|tee -a innodbio.log<br />

probe process("mysqld").functi<strong>on</strong>("os_file*") {<br />

printf("%d %d %s\n",gettimeofday_s(),tid(), probefunc() );<br />

print_ustack(ubacktrace()); printf("\n");<br />

}<br />

probe process("mysqld").functi<strong>on</strong>("os_aio*") {<br />

printf("%d %d %s\n",gettimeofday_s(),tid(), probefunc() );<br />

print_ustack(ubacktrace()); printf("\n");<br />

}<br />

www.perc<strong>on</strong>a.com


Story of <strong>on</strong>e INSERT<br />

● A l<strong>on</strong>g way to open table<br />

1319306314 7273 os_aio<br />

0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />

0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />

0x7f8bd70bc3f5 : buf_read_page_low+0xd5/0x210 [/usr/sbin/mysqld]<br />

0x7f8bd70bc755 : buf_read_page+0x225/0x3e0 [/usr/sbin/mysqld]<br />

0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />

0x7f8bd70a0d69 : btr_cur_search_to_nth_level+0x399/0xf40 [/usr/sbin/mysqld]<br />

0x7f8bd7178464 : btr_pcur_open_<strong>on</strong>_user_rec+0x64/0x2e0 [/usr/sbin/mysqld]<br />

0x7f8bd70c77b1 : dict_load_table +0x1f1/0x2450 [/usr/sbin/mysqld]<br />

0x7f8bd70c65a0 : dict_table_get+0x160/0x190 [/usr/sbin/mysqld]<br />

0x7f8bd709cd0d : ha_innobase::open(char c<strong>on</strong>st*, int, unsigned int)+0x1cd/0x6a0<br />

[/usr/sbin/mysqld]<br />

www.perc<strong>on</strong>a.com


Story of <strong>on</strong>e INSERT<br />

● A l<strong>on</strong>g way to open table<br />

1319306314 7273 os_aio<br />

0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />

0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />

0x7f8bd70bc3f5 : buf_read_page_low+0xd5/0x210 [/usr/sbin/mysqld]<br />

0x7f8bd70bc755 : buf_read_page+0x225/0x3e0 [/usr/sbin/mysqld]<br />

0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />

0x7f8bd70a0d69 : btr_cur_search_to_nth_level+0x399/0xf40 [/usr/sbin/mysqld]<br />

0x7f8bd7178464 : btr_pcur_open_<strong>on</strong>_user_rec+0x64/0x2e0 [/usr/sbin/mysqld]<br />

0x7f8bd70ca5ba : dict_load_indexes+0x26a/0x13e0 [/usr/sbin/mysqld]<br />

0x7f8bd70c82d4 : dict_load_table+0xd14/0x2450 [/usr/sbin/mysqld]<br />

0x7f8bd70c65a0 : dict_table_get+0x160/0x190 [/usr/sbin/mysqld]<br />

0x7f8bd709cd0d : ha_innobase::open(char c<strong>on</strong>st*, int, unsigned int)+0x1cd/0x6a0<br />

0x7f8bd6fde56d : handler::ha_open(st_table*, char c<strong>on</strong>st*, int, int)+0x3d/0x180<br />

www.perc<strong>on</strong>a.com


Story of <strong>on</strong>e INSERT<br />

● A l<strong>on</strong>g way to open table: st<strong>at</strong>istics<br />

1319306314 7273 os_file_cre<strong>at</strong>e_simple_no_error_handling<br />

0x7f8bd7111650 : os_file_cre<strong>at</strong>e_simple_no_error_handling+0x0/0x1a0<br />

[/usr/sbin/mysqld]<br />

0x7f8bd70cf8c0 : fil_node_open_file+0x170/0x440 [/usr/sbin/mysqld]<br />

0x7f8bd70cfbe9 : fil_node_prepare_for_io+0x59/0x190 [/usr/sbin/mysqld]<br />

0x7f8bd70d8986 : fil_space_get_size+0xe6/0x140 [/usr/sbin/mysqld]<br />

0x7f8bd70bc5ce : buf_read_page+0x9e/0x3e0 [/usr/sbin/mysqld]<br />

0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />

0x7f8bd7170c0f : btr_get_size+0x1af/0x330 [/usr/sbin/mysqld]<br />

0x7f8bd70c631b : dict_upd<strong>at</strong>e_st<strong>at</strong>istics_low +0x4b/0x170<br />

[/usr/sbin/mysqld]<br />

0x7f8bd70c6554 : dict_table_get+0x114/0x190 [/usr/sbin/mysqld]<br />

0x7f8bd709cd0d : ha_innobase::open(char c<strong>on</strong>st*, int, unsigned int)+0x1cd/0x6a0<br />

[/usr/sbin/mysqld]<br />

www.perc<strong>on</strong>a.com


Story of <strong>on</strong>e INSERT<br />

● St<strong>at</strong>istics is not finished:<br />

● os_file_pread<br />

● os_file_close<br />

● os_file_cre<strong>at</strong>e<br />

● os_file_lock<br />

● os_aio reads<br />

● Insert buffer oper<strong>at</strong>i<strong>on</strong>s: ibuf_merge_or_delete_for_page<br />

● Reserved pages fseg_n_reserved_pages<br />

www.perc<strong>on</strong>a.com


● Start transacti<strong>on</strong><br />

1319306314 7273 os_aio<br />

Story of <strong>on</strong>e INSERT<br />

0x7f8bd70b33d7 : buf_page_get_gen+0x147/0xa50 [/usr/sbin/mysqld]<br />

0x7f8bd715f34b : trx_sys_flush_max_trx_id+0x9b/0xe0<br />

[/usr/sbin/mysqld]<br />

0x7f8bd7161e05 : trx_start_low+0xe5/0x1c0 [/usr/sbin/mysqld]<br />

0x7f8bd7161f4f : trx_start+0x6f/0xe0 [/usr/sbin/mysqld]<br />

● Read required pages before transacti<strong>on</strong> start<br />

● Save rowid in header file page<br />

dict_hdr_flush_row_id<br />

www.perc<strong>on</strong>a.com


● Commit<br />

Story of <strong>on</strong>e INSERT<br />

● XA two phase fix<strong>at</strong>i<strong>on</strong><br />

1319306314 7273 os_aio<br />

0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />

0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />

0x7f8bd70ffd6b : log_group_write_buf+0x1ab/0x370 [/usr/sbin/mysqld]<br />

0x7f8bd710037e : log_write_up_to+0x44e/0x7d0 [/usr/sbin/mysqld]<br />

0x7f8bd71644c9 : trx_prepare_off_kernel+0x2d9/0x330 [/usr/sbin/mysqld]<br />

0x7f8bd7164589 : trx_prepare_for_mysql+0x69/0x180 [/usr/sbin/mysqld]<br />

0x7f8bd709830b : innobase_xa_prepare(handlert<strong>on</strong>*, THD*, bool)+0xeb/0x150<br />

[/usr/sbin/mysqld]<br />

● And the same for normal logs<br />

www.perc<strong>on</strong>a.com


Story of <strong>on</strong>e INSERT<br />

● Periodic oper<strong>at</strong>i<strong>on</strong>s:<br />

● Write dirty pages<br />

1319306314 7273 os_aio<br />

1319306324 7270 os_aio<br />

0x7f8bd7113b60 : os_aio+0x0/0x550 [/usr/sbin/mysqld]<br />

0x7f8bd70d813d : fil_io+0x1dd/0x4b0 [/usr/sbin/mysqld]<br />

0x7f8bd70b69ba : buf_flush_buffered_writes+0x26a/0x620<br />

[/usr/sbin/mysqld]<br />

0x7f8bd70b75f7 : buf_flush_b<strong>at</strong>ch+0x157/0x11f0 [/usr/sbin/mysqld]<br />

0x7f8bd714b666 : srv_master_thread+0xbe6/0xc60 [/usr/sbin/mysqld]<br />

● And log_checkpoint is called from the same<br />

thread<br />

www.perc<strong>on</strong>a.com


Thank You to Our Sp<strong>on</strong>sors<br />

Pl<strong>at</strong>inum Sp<strong>on</strong>sor<br />

Gold Sp<strong>on</strong>sor<br />

Silver Sp<strong>on</strong>sors<br />

www.perc<strong>on</strong>a.com


Perc<strong>on</strong>a Live L<strong>on</strong>d<strong>on</strong> Sp<strong>on</strong>sors<br />

Exhibitor Sp<strong>on</strong>sors<br />

Friends of Perc<strong>on</strong>a Sp<strong>on</strong>sors<br />

Media Sp<strong>on</strong>sors<br />

www.perc<strong>on</strong>a.com


Annual <str<strong>on</strong>g>MySQL</str<strong>on</strong>g> Users C<strong>on</strong>ference<br />

Presented by Perc<strong>on</strong>a Live<br />

The Hy<strong>at</strong>t Regency Hotel, Santa Clara, CA<br />

April 10th-12th, 2012<br />

Fe<strong>at</strong>ured Speakers<br />

Mark Callaghan, Facebook<br />

Jeremy Zawodny, Craigslist<br />

Marten Mickos, Eucalyptus Systems<br />

Sarah Novotny, Blue Gecko<br />

Peter Zaitsev, Perc<strong>on</strong>a<br />

Bar<strong>on</strong> Schwartz, Perc<strong>on</strong>a<br />

The Call for Papers is Now Open!<br />

Visit www.perc<strong>on</strong>a.com/live/mysql-c<strong>on</strong>ference-2012/<br />

www.perc<strong>on</strong>a.com


nickolay.ihalainen@perc<strong>on</strong>a.com<br />

We're Hiring! www.perc<strong>on</strong>a.com/about-us/careers/


www.perc<strong>on</strong>a.com/live

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!