Simon Ogden recently wrote up an excellent post regarding Out of Range Histogram Adjustments. While he explains in detail how the out of range selectivity is computed, I think we can make it a little bit clearer. I hope Simon doesn’t mind! Image may be NSFW.
Clik here to view.
The output of optdiag shows two unique values in the stored statistics (histogram) for the a table. :
Step Weight Value 1 0.00000000 < "C " 2 0.01000000 = "C " 3 0.00000000 < "N " 4 0.99000001 = "N "
Remember the weight values for the unique values. Think of the weight values as percentage of the table not counting out of range values. So if we have the following query then we could expect 1% of the table to contain that value:
select * from table a where a1 = "C"
Let's move on to out of range values. The general equation to determine the selectivity of a single out of range unique value is:
(1) Image may be NSFW.
Clik here to view.
We can simplify that using a bit of algebra:
(2) Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view. = the number of unique values in the histogram
Image may be NSFW.
Clik here to view. = the number of unique values outside of the histogram
Image may be NSFW.
Clik here to view. = selectivity of a single out of range unique value
Let's assume we've added data to the table but haven't run update statistics yet. The stored statistics won't contain any metrics on the new data. So, ASE will estimate what it thinks the selectivity could be for the values we are searching on.
In the following query, we are searching for T and Y.
select * from a where a1 in ('T', 'Y')
So how does ASE determine the selectivity for T and Y? We can use the selectivity query (2) replacing Image may be NSFW.
Clik here to view. and Image may be NSFW.
Clik here to view. to compute Image may be NSFW.
Clik here to view.:
Image may be NSFW.
Clik here to view.
So, the selectivity for T is 0.1666667 and Y is 0.1666667. Since the sum of the histogram and the out of histogram selectivity (weight) must equal 1, we need to adjust the histogram weight values.
We have two unique values in the histogram: C and N. The revised histogram weight (Image may be NSFW.
Clik here to view.) is the percentage of what is left for in histogram unique values:
(3) Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
As you can see, we have 0.6666667 left over. If you remember from the histogram, the value C composes 1% of the table and the value N 99% of the table. All we have to do is plug those in:
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
So we get the results that we're looking for:
'C' 0.0066667 'N' 0.6600000 'T' 0.1666667 'Y' 0.1666667