Click here to download this presentation in PDF format. - Sybase

Assumptions 

This is NOT going to be a ‘Basic’ Presentation 

We will be reviewing and discussing fairly advanced areas 

of Optimizer P&T; some of this you may have seen in the 

past, but a little review never hurt 

• You’ve worked with optimizer P&T 

• You’re running ASE 11.9.2 or above 

• You understand the basics of optimization 

• You’ve used Traceons 302/310 and Optdiag 

• You’ve used the various update statistics syntax available in 

ASE 11.9.2 and above 

• You really want to know about tuning the statistics

There are Two Kinds of Optimizer Statistics 

• Table/Index level - describes a table and its index(es) 

• Page/row counts, cluster ratios, deleted and forwarded rows 

• Some are updated dynamically as DML occurs 

• page/ row counts, deleted rows, forwarded rows, cluster ratios 

• Stored in systabstats 

• Column level - describes the data to the optimizer 

• Histogram (distribution), density values, default selectivity 

values 

• Static, need to be updated or written directly 

• Stored in sysstatistics 

• This presentation deals with the column level statistics

Some Quick Definitions 

Range cell density: 0.0037264745412389 

Total density: 0.3208892191740000 

Range selectivity: default used (0.33) 

In between selectivity: default used (0.25) 

Histogram for column: “A" 

Column datatype: integer 

Requested step count: 20 

Actual step count: 10 

Step Weight Value 

1 0.00000000

Statistics On Inner Columns 

of Composite Indexes 

• Think of a composite index as a 3D object, columns with 

statistics are transparent, those without statistics are 

opaque 

• Columns with statistics give the optimizer a clearer picture 

of an index – sometimes good, sometimes not 

• This is a fairly common practice 

• Does add maintenance 

• update index statistics most commonly used to do this


of Composite Indexes cont. 

Index on columns E and B – No statistics on column B 

select * from TW4 

where E = "yes" and b >= 959789065 and id >= 600000 and 

F > "May 14, 2002“ and A_A = 959000000 

Beginning selection of qualifying indexes for table TW4', 

varno = 0, objectid 464004684. 

The table (Allpages) has 1000000 rows, 24098 pages, 

Estimated selectivity for E, 

selectivity = 0.527436, upper limit = 0.527436. 

No statistics available for B, 

using the default range selectivity to estimate selectivity. 

Estimated selectivity for B, 

selectivity = 0.330000.



The best qualifying index is ‘E_B' (indid 7) 

costing 49264 pages, with an estimate of 191 

rows to be returned per scan of the table 

FINAL PLAN (total cost = 481960): 

varno=0 (TW4) indexid=0 () 

path=0xfbccc120 pathtype=sclause 

method=NESTED ITERATION 

Table: TW4 scan count 1, logical reads:(regular=24098 

apf=0 total=24098) 

physical reads: (regular=16468 apf=0 total=16468), 

apf IOs used=0



Statistics are now on column B 



Estimated selectivity for B, 


The best qualifying index is ‘E_B' (indid 7) 

costing 3317 pages,with an estimate of 13 rows to 

be returned per scan of the table 


varno=0 (TW4) indexid=7 (E_B) 

path=0xfbd1da08 pathtype=sclause 


Table: TW4 scan count 1, logical 

reads:(regular=4070 apf=0 total=4070), 

physical reads: (regular=820 apf=0 total=820),

Statistics On Non-Indexed Columns and Joins 

Can’t help with index selection but can affect join ordering 

• Columns with statistics give the optimizer a clearer picture of the 

column – no hard coded assumptions have to be used 

• When costing joins of non-indexed columns having statistics may 

result in better plans than using the default values 

• Without statistics there will be no Total density or histogram that the 

optimizer can use to cost the column in the join 

• Yes, in some circumstances histograms can be used in costing joins – 

if there is a SARG on the joining column and that column is also in the 

join table then the SARG from the joining table can be used to filter the 

join table 

• If there is no SARG on the join column or on the joining column the 

Total density value (with stats) or the default value (w/o stats) will be 

used

Statistics On Non-Indexed Columns 

and Joins cont. 

“Inherited” SARG example 

select ....from TW1, TW4 

where TW1.A = TW4.A and TW1.A = 10 

Selecting best index for the JOIN CLAUSE: 

TW4.A = TW1.A 

TW4.A = 10 

Estimated selectivity for a, 

selectivity = 0.003726,upper limit = 0.049683. 

Histogram values used 

select ....from TW1, TW4 

where TW1.A = TW4.A and TW1.B = 10 


TW4.A = TW1.A 

Estimated selectivity for a, 

selectivity = 0.320889. Total density value used


and Joins - Example 

select * from TW1,TW2 

where TW1.A=TW2.A and TW1.A =805975090 

A simple join with a SARG on the join column of one table 

Table TW2 column A has no statistics, TW1 column A does 

Selecting best index for the JOIN CLAUSE: (for TW2.A) 

TW2.A = TW1.A 

TW2.A = 805975090 Inherited from SARG on TW1 

But, can’t help…no stats 

Estimated selectivity for A, 

selectivity = 0.100000. 

The best qualifying access is a table scan, 

costing 13384 pages, with an estimate of 50000 

rows to be returned per scan of the table, 

using no data prefetch (size 2K I/O), 

in data cache 'default data cache' (cacheid 0) 

with MRU replacement 

Join selectivity is 0.100000. 

Inherited SARG from other table doesn’t help in this case


and Joins – Example cont. 

Without statistics on TW2.A the plan includes a reformat 

with TW1 as the outer table 


varno=0 (TW1) indexid=2 (A_E_F) 

path=0xfbd46800 pathtype=sclause 



path=0xfbd0bb10 pathtype=join 

method=REFORMATTING 

• Not the best plan – but the optimizer had little to go on



• Table TW2 column A now has statistics 

• The inherited SARG on TW1.A can now be used to help 

filter the join on TW2.A 


TW2.A = TW1.A 

TW2.A = 805975090 

Estimated selectivity for A, 


The best qualifying access is a table scan, 

costing 13384 pages, with an estimate of 724 rows to be 

returned per scan of the table, using no data prefetch 

(size 2K I/O), in data cache 'default data cache' (cacheid 

0) with MRU replacement 

Join selectivity is 0.001447.



• With statistics on TW2.A reformatting is not used and the 

join order has changed 



path=0xfbd0b800 pathtype=sclause 


varno=0 (TW1) indexid=2 (A_E_F) 

path=0xfbd46800 pathtype=sclause 

method=NESTED ITERATION

The Effects of Changing the 

Number of Steps (Cells) 

• The number of cells (steps) affects SARG costing – as the number 

of steps changes, costing does too 

• Cell weights and range cell density are used in costing SARGs 

• Cell weight is used as column’s ‘upper limit’ Range cell density is used 

as ‘selectivity’ for Equi-SARGs – as seen in 302 output 

• Result(s) of interpolation is used as column ‘selectivity’ for Range 

SARGs 

• Increasing the number of steps narrows the average cell width, thus the 

weight of Range cells decreases 

• Can also result in more Frequency count cells and thus change the 

Range cell density value 

• More cells means more granular cells

The Effects of Changing the Number of Steps 

(Cells) cont. 

Average cell width = # of rows/(# of requested steps –1) 

• Table has 1 million rows, requested 20 steps - 

• 1,000,000/19 = 52,632 rows per cell 

• 1,000,000/199 = 5,025 rows per cell 

• What does this mean? 

• As you increase the number of steps (cells) they 

become narrower – representing fewer values 

• We’ll see that this has an effect on how the optimizer 

estimates the cost of a SARG


Number of Steps (Cells) cont. 

Changing the number of steps – effects on Equi-SARGs 

select A from TW2 where B = 842000000 

With 20 cells (steps) in the histogram 


9 0.05263200

• Range cell density decreased because Frequency 

count cells appeared in the histogram 





77 0.00507200



Changing the number of steps – effects on Range SARGs - 

select * from TW2 where B between 

825570000 and 830000000 



9 0.05263200



select * from TW2 where B between 

825570000 and 830000000 



67 0.00505200

Adding Boundary Values To The Histogram 

• Changing the boundary values can keep SARG values 

within the histogram 

• Avoids ‘out of bounds’ costing 

• Out of bounds costing usually happens on an atomic column 

whose histogram is out of date in relation the SARG value(s) 

• Optimizer has only two choices for selectivity – 1 or 0 

depending on the SARG operator and which end of the 

histogram the SARG value falls outside of

Adding Boundary Values 

To The Histogram cont. 

Histogram for column: “F" 

Column datatype: datetimn 

Requested step count: 20 

Actual step count: 20 


1 0.28396901 < "May 1 2002 12:00:00:000AM" 

2 0.04839900 = "May 1 2002 12:00:00:000AM“ 

 

20 0.00432500



Out of bounds costing that uses a 0.00 selectivity 

select count(*) from TW1 where F = "April 30, 2002“ 



Out of bounds costing that uses a 1.00 selectivity 

select count(*) from TW1 where F >= “Apr 30 2002” 

> “Apr 30 2002” 

“May 16 2002” 

Estimated selectivity for F, 


Lower bound search value 'Apr 30 2002 12:00:00:000AM' is less 

than the smallest value in sysstatistics for this column. 

Estimating selectivity of index ‘ind_F', indid 6 

scan selectivity 1.000000,filter selectivity 1.000000 

Search argument selectivity is 1.000000.



What to do if out of bounds costing is a problem 

• Not always a problem, particularly when a selectivity of 

0.000000 is used 

• There are two ways to deal with it 

• Add a dummy row to the table with a column value that 

allows the SARG value(s) to fall within the histogram – not 

always allowed 

• If you do add a dummy row keep in mind that it will affect 

the histograms of other columns; be careful with the values 

you use 

• Write a new histogram boundary using optdiag. Edit the file 

and read it back in. This won’t directly affect the data, but it 

will extend the histogram to include the SARG values(s)

Removing Statistics Can Effect Query Plans 

Sometimes no statistics are better then having them 

This will usually be an issue when very dense columns 

are involved 

Histogram for column: “E" 


1 0.00000000 < "no" 

2 0.47256401 = "no" 

3 0.00000000 < "yes" 

4 0.52743602 = "yes“ 

This can also show up when you have ‘spikes’ 

(Frequency count cells) in the distribution

Removing Statistics Can 

Effect Query Plans cont. 

select count(*) from TW4 

where E = “yes” and C = 825765940 

The table…has 1000000 rows, 24098 pages, 



Estimating selectivity of index ‘E_AA_B', indid 6 


527436 rows, 174107 pages 

The best qualifying index is ‘E_AA_B' (indid 6) 

costing 174107 pages, with an estimate of 526 rows 

FROM TABLE 

TW4 

Nested iteration. 

Table Scan.

Removing Statistics Can 

Effect Query Plans cont. 

delete statistics TW4(E) 



Estimating selectivity of index ‘E_AA_B', indid 6 


100000 rows, 20584 pages 

The best qualifying index is ‘E_AA_B (indid 6) 

costing 20584 pages, with an estimate of 92 rows 

FROM TABLE 

TW4 

Nested iteration. 

Index : E_AA_B 

Forward scan. 

Positioning by key.

Maintaining Tuned Statistics 

Tuned statistics will add to your maintenance 

• Any statistical value you write to sysstatistics either via 

optdiag or sp_modifystats will be overwritten by update 

statistics 

• Keep optdiag input files for reuse 

• If needed get an optdiag output file, edit it and read it in 

• Keep scripts that run sp_modifystats 

• Rewrite tuned statistics after running update statistics that 

affects the column with the modified statistics

Sampling For Update Statistics 

New feature in 12.5.0.3 

• Can dramatically speed up the running of update statistics 

• Reads rows from random pages to build column level 

statistics (histogram) 

• The percentage of pages to sample can be specified 

update statistics table(col) with sampling=10 percent 

• Also applies to update index statistics and 

update all statistics 

• Unofficial tests show that a sampling rate of 10% on a 1 

million row numeric column reduces the time for update 

statistics to run from 9 minutes to 30 seconds

Sampling For Update Statistics cont. 

• Density values not updated by sampling 

• Sampled statistics will vary from those obtained by a ‘full 

scan’ 

• More variations will appear as the sampling rate 

decreases 

• Test queries against sampled statistics. In most cases 

you won’t see any major changes 

• Values may become ‘out of bounds’ this will affect the 

optimizer – likely to have greatest affect on atomic 

columns

Where To Get More Information 

• The Sybase Customer newsgroups 

• http://support.sybase.com/newsgroups 

• The Sybase list server 

• SYBASE-L@LISTSERV.UCSB.EDU 

• The external Sybase FAQ 

• http://www.isug.com/Sybase_FAQ/ 

• Join the ISUG, ISUG Technical Journal, feature requests 

• http://www.isug.com

Where To Get More Information 

• The latest Performance and Tuning Guide 

• Don’t be put off by the ASE 12.0 in the title, it covers the 

11.9.2 features/functionality too 

• http://sybooks.sybase.com/onlinebooks/group-as/asg1200e 

• Any “What’s New” docs for a new ASE release 

• Tech Docs at Sybase Support 

• http://techinfo.sybase.com/css/techinfo.nsf/Home 

• Upgrade/Migration help page 

• http://www.sybase.com/support/techdocs/migration

Sybase Developer Network (SDN) 

Additional Resources for Developers/DBAs 

• Single point of access to developer software, services, 

and up-to-date technical information: 

• White papers and documentation 

• Collaboration with other developers and Sybase engineers 

• Code samples and beta programs 

• Technical recordings 

• Free software 

• Join today: www.sybase.com/developer or visit SDN at 

TechWave’s Technology Boardwalk

Click here to download this presentation in PDF format. - Sybase

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?