Wednesday, January 27, 2010

How CBO calculates cardinality (Part 2)

This is the continuation of my previous post which you can read here.

column > (<) const
According to the book the cardinality for such a condition can be calculated with the formula below:
Cardinality = num_rows * (needed_range_length / total_range_length + 1 / num_dist*)
* in case if the range is closed (>= or <=)

For example lets assume we have a condition column > 3 and high_value = 12 in that column. In that case the needed_range_length will be high_value – const = 12 – 3 = 9. The total_range_length is being calculated as high_value – low_value from user_tab_col_statistic. In case when we have the “>=” sign, we have have to add 1 / num_distinct.

The formula above is valid for a constant value between the lowest and highest values. In case we use an out-of-range value for the constant the cardinality can be calculated in two ways depending on a condition which is easier to explain on an example:
  1. const < low_value and column < const or const > high_value and column > const
  2. const < low_value and column > const or const > high_value and column < const
In case with the first example, everything is quite obvious, the cardinality will be equal to num_rows. The second case on the other hand is a bit more interesting. Apparently the cardinality of such a query should be (from the humane point of view) equal to 0, but CBO uses the same “wearing off” formula as it does in case with the condition column = const where const is out of range. In other words, for the second example CBO will calculate cardinality as if it’s the equality condition (column = const), which I described in the previous post.

In order to finish the description of inequalities it’s left to cover only the case with bind variables. Here’s the formula for that purpose:
Cardinality = num_rows / num_dist + 1
So it’s the cardinality of condition column = :1 but increased by 1.

column between Am and An
The between condition is an extended inequality condition, but instead of high_value (or low_value depending on the sign) CBO uses Am and An to calculate needed_range_length. Here’s the basic formula for such a condition:
Cardinality = num_rows * ((An - Am) / (high_value low_value) + 1 / num_dist)
It’s valid for An > Am and both An and Am are within the range between low and high values from user_tab_col_statistic. But what if either Am or An is out of the range? The answer is simple (at least for Oracle and above where I tested it). If the range Am-An has common elements with the range low_value - high_value, than CBO calculates the cardinality using the same formula having substituted an out-of-range edge (either An or Am) with corresponding edge value (high_value if An is out of bound or low_value if Am if out bound) from user_tab_col_statistic. In case if these two ranges have nothing in common, Oracle results the cardinality as the condition is column = :1 (equal to num_rows / num_dist).

Now let’s take a look at bind variables. What would be the cardinality for the condition column between :1 and :2? According to Lewis Oracle prior version 10.2 takes 0.0025 of num_rows and uses the rounded result as the cardinality. Oracle does something similar but not exactly:
Cardinality = num_rows / (num_dist * num_dist) + 1
Finally, here’s the formula for Oracle
Cardinality = num_rows / num_dist + 1

column > const AND column <= const
In case with constants both versions ( and calculate cardinality in the same way (at least with the same result). Both of them have realized that no rows should be returned and therefore added the filter NULL is NOT NULL into the plans.

But once you substitute the constant with a bind variable everything turns upside down. Neither of databases realizes that the result should be no rows at all. Moreover, 11g calculates cardinality in usual for it way as num_rows / num_dist + 1, whilst 10g does usual for it thing and calculates the cardinality as num_rows / (num_dist * num_dist) + 1.