Click here to download this presentation in PDF format. - Sybase
Click here to download this presentation in PDF format. - Sybase
Click here to download this presentation in PDF format. - Sybase
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Assumptions<br />
This is NOT go<strong>in</strong>g <strong>to</strong> be a ‘Basic’ Presentation<br />
We will be review<strong>in</strong>g and discuss<strong>in</strong>g fairly advanced areas<br />
of Optimizer P&T; some of <strong>this</strong> you may have seen <strong>in</strong> the<br />
past, but a little review never hurt<br />
• You’ve worked with optimizer P&T<br />
• You’re runn<strong>in</strong>g ASE 11.9.2 or above<br />
• You understand the basics of optimization<br />
• You’ve used Traceons 302/310 and Optdiag<br />
• You’ve used the various update statistics syntax available <strong>in</strong><br />
ASE 11.9.2 and above<br />
• You really want <strong>to</strong> know about tun<strong>in</strong>g the statistics
T<strong>here</strong> are Two K<strong>in</strong>ds of Optimizer Statistics<br />
• Table/Index level - describes a table and its <strong>in</strong>dex(es)<br />
• Page/row counts, cluster ratios, deleted and forwarded rows<br />
• Some are updated dynamically as DML occurs<br />
• page/ row counts, deleted rows, forwarded rows, cluster ratios<br />
• S<strong>to</strong>red <strong>in</strong> systabstats<br />
• Column level - describes the data <strong>to</strong> the optimizer<br />
• His<strong>to</strong>gram (distribution), density values, default selectivity<br />
values<br />
• Static, need <strong>to</strong> be updated or written directly<br />
• S<strong>to</strong>red <strong>in</strong> sysstatistics<br />
• This <strong>presentation</strong> deals with the column level statistics
Some Quick Def<strong>in</strong>itions<br />
Range cell density: 0.0037264745412389<br />
Total density: 0.3208892191740000<br />
Range selectivity: default used (0.33)<br />
In between selectivity: default used (0.25)<br />
His<strong>to</strong>gram for column: “A"<br />
Column datatype: <strong>in</strong>teger<br />
Requested step count: 20<br />
Actual step count: 10<br />
Step Weight Value<br />
1 0.00000000
Statistics On Inner Columns<br />
of Composite Indexes<br />
• Th<strong>in</strong>k of a composite <strong>in</strong>dex as a 3D object, columns with<br />
statistics are transparent, those without statistics are<br />
opaque<br />
• Columns with statistics give the optimizer a clearer picture<br />
of an <strong>in</strong>dex – sometimes good, sometimes not<br />
• This is a fairly common practice<br />
• Does add ma<strong>in</strong>tenance<br />
• update <strong>in</strong>dex statistics most commonly used <strong>to</strong> do <strong>this</strong>
Statistics On Inner Columns<br />
of Composite Indexes cont.<br />
Index on columns E and B – No statistics on column B<br />
select * from TW4<br />
w<strong>here</strong> E = "yes" and b >= 959789065 and id >= 600000 and<br />
F > "May 14, 2002“ and A_A = 959000000<br />
Beg<strong>in</strong>n<strong>in</strong>g selection of qualify<strong>in</strong>g <strong>in</strong>dexes for table TW4',<br />
varno = 0, objectid 464004684.<br />
The table (Allpages) has 1000000 rows, 24098 pages,<br />
Estimated selectivity for E,<br />
selectivity = 0.527436, upper limit = 0.527436.<br />
No statistics available for B,<br />
us<strong>in</strong>g the default range selectivity <strong>to</strong> estimate selectivity.<br />
Estimated selectivity for B,<br />
selectivity = 0.330000.
Statistics On Inner Columns<br />
of Composite Indexes cont.<br />
The best qualify<strong>in</strong>g <strong>in</strong>dex is ‘E_B' (<strong>in</strong>did 7)<br />
cost<strong>in</strong>g 49264 pages, with an estimate of 191<br />
rows <strong>to</strong> be returned per scan of the table<br />
FINAL PLAN (<strong>to</strong>tal cost = 481960):<br />
varno=0 (TW4) <strong>in</strong>dexid=0 ()<br />
path=0xfbccc120 pathtype=sclause<br />
method=NESTED ITERATION<br />
Table: TW4 scan count 1, logical reads:(regular=24098<br />
apf=0 <strong>to</strong>tal=24098)<br />
physical reads: (regular=16468 apf=0 <strong>to</strong>tal=16468),<br />
apf IOs used=0
Statistics On Inner Columns<br />
of Composite Indexes cont.<br />
Statistics are now on column B<br />
Estimated selectivity for E,<br />
selectivity = 0.527436, upper limit = 0.527436.<br />
Estimated selectivity for B,<br />
selectivity = 0.022199, upper limit = 0.074835.<br />
The best qualify<strong>in</strong>g <strong>in</strong>dex is ‘E_B' (<strong>in</strong>did 7)<br />
cost<strong>in</strong>g 3317 pages,with an estimate of 13 rows <strong>to</strong><br />
be returned per scan of the table<br />
FINAL PLAN (<strong>to</strong>tal cost = 55108):<br />
varno=0 (TW4) <strong>in</strong>dexid=7 (E_B)<br />
path=0xfbd1da08 pathtype=sclause<br />
method=NESTED ITERATION<br />
Table: TW4 scan count 1, logical<br />
reads:(regular=4070 apf=0 <strong>to</strong>tal=4070),<br />
physical reads: (regular=820 apf=0 <strong>to</strong>tal=820),
Statistics On Non-Indexed Columns and Jo<strong>in</strong>s<br />
Can’t help with <strong>in</strong>dex selection but can affect jo<strong>in</strong> order<strong>in</strong>g<br />
• Columns with statistics give the optimizer a clearer picture of the<br />
column – no hard coded assumptions have <strong>to</strong> be used<br />
• When cost<strong>in</strong>g jo<strong>in</strong>s of non-<strong>in</strong>dexed columns hav<strong>in</strong>g statistics may<br />
result <strong>in</strong> better plans than us<strong>in</strong>g the default values<br />
• Without statistics t<strong>here</strong> will be no Total density or his<strong>to</strong>gram that the<br />
optimizer can use <strong>to</strong> cost the column <strong>in</strong> the jo<strong>in</strong><br />
• Yes, <strong>in</strong> some circumstances his<strong>to</strong>grams can be used <strong>in</strong> cost<strong>in</strong>g jo<strong>in</strong>s –<br />
if t<strong>here</strong> is a SARG on the jo<strong>in</strong><strong>in</strong>g column and that column is also <strong>in</strong> the<br />
jo<strong>in</strong> table then the SARG from the jo<strong>in</strong><strong>in</strong>g table can be used <strong>to</strong> filter the<br />
jo<strong>in</strong> table<br />
• If t<strong>here</strong> is no SARG on the jo<strong>in</strong> column or on the jo<strong>in</strong><strong>in</strong>g column the<br />
Total density value (with stats) or the default value (w/o stats) will be<br />
used
Statistics On Non-Indexed Columns<br />
and Jo<strong>in</strong>s cont.<br />
“Inherited” SARG example<br />
select ....from TW1, TW4<br />
w<strong>here</strong> TW1.A = TW4.A and TW1.A = 10<br />
Select<strong>in</strong>g best <strong>in</strong>dex for the JOIN CLAUSE:<br />
TW4.A = TW1.A<br />
TW4.A = 10<br />
Estimated selectivity for a,<br />
selectivity = 0.003726,upper limit = 0.049683.<br />
His<strong>to</strong>gram values used<br />
select ....from TW1, TW4<br />
w<strong>here</strong> TW1.A = TW4.A and TW1.B = 10<br />
Select<strong>in</strong>g best <strong>in</strong>dex for the JOIN CLAUSE:<br />
TW4.A = TW1.A<br />
Estimated selectivity for a,<br />
selectivity = 0.320889. Total density value used
Statistics On Non-Indexed Columns<br />
and Jo<strong>in</strong>s - Example<br />
select * from TW1,TW2<br />
w<strong>here</strong> TW1.A=TW2.A and TW1.A =805975090<br />
A simple jo<strong>in</strong> with a SARG on the jo<strong>in</strong> column of one table<br />
Table TW2 column A has no statistics, TW1 column A does<br />
Select<strong>in</strong>g best <strong>in</strong>dex for the JOIN CLAUSE: (for TW2.A)<br />
TW2.A = TW1.A<br />
TW2.A = 805975090 Inherited from SARG on TW1<br />
But, can’t help…no stats<br />
Estimated selectivity for A,<br />
selectivity = 0.100000.<br />
The best qualify<strong>in</strong>g access is a table scan,<br />
cost<strong>in</strong>g 13384 pages, with an estimate of 50000<br />
rows <strong>to</strong> be returned per scan of the table,<br />
us<strong>in</strong>g no data prefetch (size 2K I/O),<br />
<strong>in</strong> data cache 'default data cache' (cacheid 0)<br />
with MRU replacement<br />
Jo<strong>in</strong> selectivity is 0.100000.<br />
Inherited SARG from other table doesn’t help <strong>in</strong> <strong>this</strong> case
Statistics On Non-Indexed Columns<br />
and Jo<strong>in</strong>s – Example cont.<br />
Without statistics on TW2.A the plan <strong>in</strong>cludes a re<strong>format</strong><br />
with TW1 as the outer table<br />
FINAL PLAN (<strong>to</strong>tal cost = 2855774):<br />
varno=0 (TW1) <strong>in</strong>dexid=2 (A_E_F)<br />
path=0xfbd46800 pathtype=sclause<br />
method=NESTED ITERATION<br />
varno=1 (TW2) <strong>in</strong>dexid=0 ()<br />
path=0xfbd0bb10 pathtype=jo<strong>in</strong><br />
method=REFORMATTING<br />
• Not the best plan – but the optimizer had little <strong>to</strong> go on
Statistics On Non-Indexed Columns<br />
and Jo<strong>in</strong>s – Example cont.<br />
• Table TW2 column A now has statistics<br />
• The <strong>in</strong>herited SARG on TW1.A can now be used <strong>to</strong> help<br />
filter the jo<strong>in</strong> on TW2.A<br />
Select<strong>in</strong>g best <strong>in</strong>dex for the JOIN CLAUSE:<br />
TW2.A = TW1.A<br />
TW2.A = 805975090<br />
Estimated selectivity for A,<br />
selectivity = 0.001447, upper limit = 0.052948.<br />
The best qualify<strong>in</strong>g access is a table scan,<br />
cost<strong>in</strong>g 13384 pages, with an estimate of 724 rows <strong>to</strong> be<br />
returned per scan of the table, us<strong>in</strong>g no data prefetch<br />
(size 2K I/O), <strong>in</strong> data cache 'default data cache' (cacheid<br />
0) with MRU replacement<br />
Jo<strong>in</strong> selectivity is 0.001447.
Statistics On Non-Indexed Columns<br />
and Jo<strong>in</strong>s – Example cont.<br />
• With statistics on TW2.A re<strong>format</strong>t<strong>in</strong>g is not used and the<br />
jo<strong>in</strong> order has changed<br />
FINAL PLAN (<strong>to</strong>tal cost = 1252148):<br />
varno=1 (TW2) <strong>in</strong>dexid=0 ()<br />
path=0xfbd0b800 pathtype=sclause<br />
method=NESTED ITERATION<br />
varno=0 (TW1) <strong>in</strong>dexid=2 (A_E_F)<br />
path=0xfbd46800 pathtype=sclause<br />
method=NESTED ITERATION
The Effects of Chang<strong>in</strong>g the<br />
Number of Steps (Cells)<br />
• The number of cells (steps) affects SARG cost<strong>in</strong>g – as the number<br />
of steps changes, cost<strong>in</strong>g does <strong>to</strong>o<br />
• Cell weights and range cell density are used <strong>in</strong> cost<strong>in</strong>g SARGs<br />
• Cell weight is used as column’s ‘upper limit’ Range cell density is used<br />
as ‘selectivity’ for Equi-SARGs – as seen <strong>in</strong> 302 output<br />
• Result(s) of <strong>in</strong>terpolation is used as column ‘selectivity’ for Range<br />
SARGs<br />
• Increas<strong>in</strong>g the number of steps narrows the average cell width, thus the<br />
weight of Range cells decreases<br />
• Can also result <strong>in</strong> more Frequency count cells and thus change the<br />
Range cell density value<br />
• More cells means more granular cells
The Effects of Chang<strong>in</strong>g the Number of Steps<br />
(Cells) cont.<br />
Average cell width = # of rows/(# of requested steps –1)<br />
• Table has 1 million rows, requested 20 steps -<br />
• 1,000,000/19 = 52,632 rows per cell<br />
• 1,000,000/199 = 5,025 rows per cell<br />
• What does <strong>this</strong> mean?<br />
• As you <strong>in</strong>crease the number of steps (cells) they<br />
become narrower – represent<strong>in</strong>g fewer values<br />
• We’ll see that <strong>this</strong> has an effect on how the optimizer<br />
estimates the cost of a SARG
The Effects of Chang<strong>in</strong>g the<br />
Number of Steps (Cells) cont.<br />
Chang<strong>in</strong>g the number of steps – effects on Equi-SARGs<br />
select A from TW2 w<strong>here</strong> B = 842000000<br />
With 20 cells (steps) <strong>in</strong> the his<strong>to</strong>gram<br />
Range cell density: 0.0012829768785739<br />
9 0.05263200
• Range cell density decreased because Frequency<br />
count cells appeared <strong>in</strong> the his<strong>to</strong>gram<br />
The Effects of Chang<strong>in</strong>g the<br />
Number of Steps (Cells) cont.<br />
With 200 cells (steps) <strong>in</strong> the his<strong>to</strong>gram<br />
Range cell density: 0.0002303825911991<br />
77 0.00507200
The Effects of Chang<strong>in</strong>g the<br />
Number of Steps (Cells) cont.<br />
Chang<strong>in</strong>g the number of steps – effects on Range SARGs -<br />
select * from TW2 w<strong>here</strong> B between<br />
825570000 and 830000000<br />
With 20 cells (steps) <strong>in</strong> the his<strong>to</strong>gram<br />
Range cell density: 0.0012829768785739<br />
9 0.05263200
The Effects of Chang<strong>in</strong>g the<br />
Number of Steps (Cells) cont.<br />
select * from TW2 w<strong>here</strong> B between<br />
825570000 and 830000000<br />
With 200 cells (steps) <strong>in</strong> the his<strong>to</strong>gram<br />
Range cell density: 0.0002303825911991<br />
67 0.00505200
Add<strong>in</strong>g Boundary Values To The His<strong>to</strong>gram<br />
• Chang<strong>in</strong>g the boundary values can keep SARG values<br />
with<strong>in</strong> the his<strong>to</strong>gram<br />
• Avoids ‘out of bounds’ cost<strong>in</strong>g<br />
• Out of bounds cost<strong>in</strong>g usually happens on an a<strong>to</strong>mic column<br />
whose his<strong>to</strong>gram is out of date <strong>in</strong> relation the SARG value(s)<br />
• Optimizer has only two choices for selectivity – 1 or 0<br />
depend<strong>in</strong>g on the SARG opera<strong>to</strong>r and which end of the<br />
his<strong>to</strong>gram the SARG value falls outside of
Add<strong>in</strong>g Boundary Values<br />
To The His<strong>to</strong>gram cont.<br />
His<strong>to</strong>gram for column: “F"<br />
Column datatype: datetimn<br />
Requested step count: 20<br />
Actual step count: 20<br />
Step Weight Value<br />
1 0.28396901 < "May 1 2002 12:00:00:000AM"<br />
2 0.04839900 = "May 1 2002 12:00:00:000AM“<br />
<br />
20 0.00432500
Add<strong>in</strong>g Boundary Values<br />
To The His<strong>to</strong>gram cont.<br />
Out of bounds cost<strong>in</strong>g that uses a 0.00 selectivity<br />
select count(*) from TW1 w<strong>here</strong> F = "April 30, 2002“<br />
Add<strong>in</strong>g Boundary Values<br />
To The His<strong>to</strong>gram cont.<br />
Out of bounds cost<strong>in</strong>g that uses a 1.00 selectivity<br />
select count(*) from TW1 w<strong>here</strong> F >= “Apr 30 2002”<br />
> “Apr 30 2002”<br />
“May 16 2002”<br />
Estimated selectivity for F,<br />
selectivity = 1.000000.<br />
Lower bound search value 'Apr 30 2002 12:00:00:000AM' is less<br />
than the smallest value <strong>in</strong> sysstatistics for <strong>this</strong> column.<br />
Estimat<strong>in</strong>g selectivity of <strong>in</strong>dex ‘<strong>in</strong>d_F', <strong>in</strong>did 6<br />
scan selectivity 1.000000,filter selectivity 1.000000<br />
Search argument selectivity is 1.000000.
Add<strong>in</strong>g Boundary Values<br />
To The His<strong>to</strong>gram cont.<br />
What <strong>to</strong> do if out of bounds cost<strong>in</strong>g is a problem<br />
• Not always a problem, particularly when a selectivity of<br />
0.000000 is used<br />
• T<strong>here</strong> are two ways <strong>to</strong> deal with it<br />
• Add a dummy row <strong>to</strong> the table with a column value that<br />
allows the SARG value(s) <strong>to</strong> fall with<strong>in</strong> the his<strong>to</strong>gram – not<br />
always allowed<br />
• If you do add a dummy row keep <strong>in</strong> m<strong>in</strong>d that it will affect<br />
the his<strong>to</strong>grams of other columns; be careful with the values<br />
you use<br />
• Write a new his<strong>to</strong>gram boundary us<strong>in</strong>g optdiag. Edit the file<br />
and read it back <strong>in</strong>. This won’t directly affect the data, but it<br />
will extend the his<strong>to</strong>gram <strong>to</strong> <strong>in</strong>clude the SARG values(s)
Remov<strong>in</strong>g Statistics Can Effect Query Plans<br />
Sometimes no statistics are better then hav<strong>in</strong>g them<br />
This will usually be an issue when very dense columns<br />
are <strong>in</strong>volved<br />
His<strong>to</strong>gram for column: “E"<br />
Step Weight Value<br />
1 0.00000000 < "no"<br />
2 0.47256401 = "no"<br />
3 0.00000000 < "yes"<br />
4 0.52743602 = "yes“<br />
This can also show up when you have ‘spikes’<br />
(Frequency count cells) <strong>in</strong> the distribution
Remov<strong>in</strong>g Statistics Can<br />
Effect Query Plans cont.<br />
select count(*) from TW4<br />
w<strong>here</strong> E = “yes” and C = 825765940<br />
The table…has 1000000 rows, 24098 pages,<br />
Estimated selectivity for E,<br />
selectivity = 0.527436, upper limit = 0.527436.<br />
Estimat<strong>in</strong>g selectivity of <strong>in</strong>dex ‘E_AA_B', <strong>in</strong>did 6<br />
scan selectivity 0.52743602,filter selectivity 0.527436<br />
527436 rows, 174107 pages<br />
The best qualify<strong>in</strong>g <strong>in</strong>dex is ‘E_AA_B' (<strong>in</strong>did 6)<br />
cost<strong>in</strong>g 174107 pages, with an estimate of 526 rows<br />
FROM TABLE<br />
TW4<br />
Nested iteration.<br />
Table Scan.
Remov<strong>in</strong>g Statistics Can<br />
Effect Query Plans cont.<br />
delete statistics TW4(E)<br />
Estimated selectivity for E,<br />
selectivity = 0.100000.<br />
Estimat<strong>in</strong>g selectivity of <strong>in</strong>dex ‘E_AA_B', <strong>in</strong>did 6<br />
scan selectivity 0.100000,filter selectivity 0.100000<br />
100000 rows, 20584 pages<br />
The best qualify<strong>in</strong>g <strong>in</strong>dex is ‘E_AA_B (<strong>in</strong>did 6)<br />
cost<strong>in</strong>g 20584 pages, with an estimate of 92 rows<br />
FROM TABLE<br />
TW4<br />
Nested iteration.<br />
Index : E_AA_B<br />
Forward scan.<br />
Position<strong>in</strong>g by key.
Ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g Tuned Statistics<br />
Tuned statistics will add <strong>to</strong> your ma<strong>in</strong>tenance<br />
• Any statistical value you write <strong>to</strong> sysstatistics either via<br />
optdiag or sp_modifystats will be overwritten by update<br />
statistics<br />
• Keep optdiag <strong>in</strong>put files for reuse<br />
• If needed get an optdiag output file, edit it and read it <strong>in</strong><br />
• Keep scripts that run sp_modifystats<br />
• Rewrite tuned statistics after runn<strong>in</strong>g update statistics that<br />
affects the column with the modified statistics
Sampl<strong>in</strong>g For Update Statistics<br />
New feature <strong>in</strong> 12.5.0.3<br />
• Can dramatically speed up the runn<strong>in</strong>g of update statistics<br />
• Reads rows from random pages <strong>to</strong> build column level<br />
statistics (his<strong>to</strong>gram)<br />
• The percentage of pages <strong>to</strong> sample can be specified<br />
update statistics table(col) with sampl<strong>in</strong>g=10 percent<br />
• Also applies <strong>to</strong> update <strong>in</strong>dex statistics and<br />
update all statistics<br />
• Unofficial tests show that a sampl<strong>in</strong>g rate of 10% on a 1<br />
million row numeric column reduces the time for update<br />
statistics <strong>to</strong> run from 9 m<strong>in</strong>utes <strong>to</strong> 30 seconds
Sampl<strong>in</strong>g For Update Statistics cont.<br />
• Density values not updated by sampl<strong>in</strong>g<br />
• Sampled statistics will vary from those obta<strong>in</strong>ed by a ‘full<br />
scan’<br />
• More variations will appear as the sampl<strong>in</strong>g rate<br />
decreases<br />
• Test queries aga<strong>in</strong>st sampled statistics. In most cases<br />
you won’t see any major changes<br />
• Values may become ‘out of bounds’ <strong>this</strong> will affect the<br />
optimizer – likely <strong>to</strong> have greatest affect on a<strong>to</strong>mic<br />
columns
W<strong>here</strong> To Get More In<strong>format</strong>ion<br />
• The <strong>Sybase</strong> Cus<strong>to</strong>mer newsgroups<br />
• http://support.sybase.com/newsgroups<br />
• The <strong>Sybase</strong> list server<br />
• SYBASE-L@LISTSERV.UCSB.EDU<br />
• The external <strong>Sybase</strong> FAQ<br />
• http://www.isug.com/<strong>Sybase</strong>_FAQ/<br />
• Jo<strong>in</strong> the ISUG, ISUG Technical Journal, feature requests<br />
• http://www.isug.com
W<strong>here</strong> To Get More In<strong>format</strong>ion<br />
• The latest Performance and Tun<strong>in</strong>g Guide<br />
• Don’t be put off by the ASE 12.0 <strong>in</strong> the title, it covers the<br />
11.9.2 features/functionality <strong>to</strong>o<br />
• http://sybooks.sybase.com/onl<strong>in</strong>ebooks/group-as/asg1200e<br />
• Any “What’s New” docs for a new ASE release<br />
• Tech Docs at <strong>Sybase</strong> Support<br />
• http://tech<strong>in</strong>fo.sybase.com/css/tech<strong>in</strong>fo.nsf/Home<br />
• Upgrade/Migration help page<br />
• http://www.sybase.com/support/techdocs/migration
<strong>Sybase</strong> Developer Network (SDN)<br />
Additional Resources for Developers/DBAs<br />
• S<strong>in</strong>gle po<strong>in</strong>t of access <strong>to</strong> developer software, services,<br />
and up-<strong>to</strong>-date technical <strong>in</strong><strong>format</strong>ion:<br />
• White papers and documentation<br />
• Collaboration with other developers and <strong>Sybase</strong> eng<strong>in</strong>eers<br />
• Code samples and beta programs<br />
• Technical record<strong>in</strong>gs<br />
• Free software<br />
• Jo<strong>in</strong> <strong>to</strong>day: www.sybase.com/developer or visit SDN at<br />
TechWave’s Technology Boardwalk