## Friday, December 11, 2009

### How CBO calculates cardinality (Part 1)

I’ve been rereading a Jonathan Lewis’ book about CBO internals and testing the stuff on Oracle 10.2.0.4 and 11.2.0.1 and I've found out that Oracle has changed some basic formulas in cardinality calculation. I'd like to share some updated formulas below.

column = const
First of all I'd like to recap how CBO calculates the cardinality of a query with simple condition when column equals to a constant. As you may know the result depends upon the value of constant, that is whether it's between user_tab_col_statistics' high_value and low_value or not. If it is (as well as if you have a bind variable instead of a constant), the cardinality would be calculated by something similar to the next formula:

Cardinality = user_tables.num_rows / user_tab_col_statistics.num_distinct

However if it's not then the cardinality descends linearly depending on the "distance" to the range above. A distance can be calculated by next formulas:

Distance = const - user_tab_col_statistics.high_value + 1 if const  user_tab_col_statistics.high_value or
Distance = user_tab_col_statistics.low_value - const 1 if const < user_tab_col_statistics.low_value

Now we can figure out the cardinality for any value of the constant outside the range:

Cardinality = num_rows /num_distinct * (1 - (Distance - num_distinct)  / (high_value low_value))

Pay attention that if you've got the result just a bit higher than 0, CBO probably will apply the ceil function to such a result.

column in (c1, c2, ..., cN)
The cardinality of a query with such a predicate can be calculated (according to the book) by simple summing of cardinalities of every member of the in-list unless the sum reached the num_rows value:

Cardinality = Cardinality(c1) + Cardinality(c2) + ... + Cardinality(cN) if the sum < num_rows and
Cardinality = num_rows otherwise

column not in (c1, c2, ..., cN)
This one was tough. The problem is that the cardinality of not-in-list doesn't equal to num_rows minus the cardinality of in-list. Furthermore, I've noticed that the result depends upon the number of members within the parentheses: the more members you have, the less every of them will influence the cardinality. The general formula I've derived looks like this:

Cardinality = num_rows sum(Cardinality(cN) * power((high_value low_value) / num_distinct , N - 1)) where every Cardinality(cN) whould be calculated by the formulas above

I'd like to mention a thing about that formula. As you can notice the influence of cardinality of every member depends upon its order number within the list (the bigger number, the less influence). It can affect your calculations due to different cardinalities for constants within the high-low range and outside it. The number of tests showed that apparently before final calculation CBO sorts the cardinalities of every member descendingly (don't forget that bind variables in that case are similar to the values within the range).

1. Antonol,

Just came across your post. Nice analysis. Thanks. Do you know how the cardinality is calculated (now) for BETWEEN operator ?

2. Thanks for you interesting.

I'm going to put the BETWEEN operation findings into the second part of the article which I hope to post within the week.

3. Привет спасибо за статью. У меня есть select * from emp where f_name_r2 = 'Анатолий Фёдорович'; колонка f_name_r2 без гистограммы всего в таблице 8928 строк, Num distinct = 42, Num Nulls = 8886. Подскажите пожалуйста как посчитать cardinality?

1. План показывает 1 но если считать по формуле Cardinality = user_tables.num_rows / user_tab_col_statistics.num_distinct получается 213

2. Read num_rows as number of not null rows and everything will come into place.
I didn't try to be pedantic about what I was writing, I was just trying to give a flavour of what could be happening under the hood of CBO in some situations.