Friday, December 11, 2009

How CBO calculates cardinality (Part 1)

I’ve been rereading a Jonathan Lewis’ book about CBO internals and testing the stuff on Oracle 10.2.0.4 and 11.2.0.1 and I've found out that Oracle has changed some basic formulas in cardinality calculation. I'd like to share some updated formulas below.

column = const
First of all I'd like to recap how CBO calculates the cardinality of a query with simple condition when column equals to a constant. As you may know the result depends upon the value of constant, that is whether it's between user_tab_col_statistics' high_value and low_value or not. If it is (as well as if you have a bind variable instead of a constant), the cardinality would be calculated by something similar to the next formula:

          Cardinality = user_tables.num_rows / user_tab_col_statistics.num_distinct

However if it's not then the cardinality descends linearly depending on the "distance" to the range above. A distance can be calculated by next formulas:

          Distance = const - user_tab_col_statistics.high_value + 1 if const  user_tab_col_statistics.high_value or
          Distance = user_tab_col_statistics.low_value - const 1 if const < user_tab_col_statistics.low_value

Now we can figure out the cardinality for any value of the constant outside the range:

          Cardinality = num_rows /num_distinct * (1 - (Distance - num_distinct)  / (high_value low_value))

Pay attention that if you've got the result just a bit higher than 0, CBO probably will apply the ceil function to such a result.

column in (c1, c2, ..., cN)
The cardinality of a query with such a predicate can be calculated (according to the book) by simple summing of cardinalities of every member of the in-list unless the sum reached the num_rows value:

          Cardinality = Cardinality(c1) + Cardinality(c2) + ... + Cardinality(cN) if the sum < num_rows and
          Cardinality = num_rows otherwise

column not in (c1, c2, ..., cN)
This one was tough. The problem is that the cardinality of not-in-list doesn't equal to num_rows minus the cardinality of in-list. Furthermore, I've noticed that the result depends upon the number of members within the parentheses: the more members you have, the less every of them will influence the cardinality. The general formula I've derived looks like this:

          Cardinality = num_rows sum(Cardinality(cN) * power((high_value low_value) / num_distinct , N - 1)) where every Cardinality(cN) whould be calculated by the formulas above

I'd like to mention a thing about that formula. As you can notice the influence of cardinality of every member depends upon its order number within the list (the bigger number, the less influence). It can affect your calculations due to different cardinalities for constants within the high-low range and outside it. The number of tests showed that apparently before final calculation CBO sorts the cardinalities of every member descendingly (don't forget that bind variables in that case are similar to the values within the range).