antonoal's oracle tips: cbo

Showing posts with label cbo. Show all posts

Wednesday, June 12, 2013

SQL Profile for INSERT-ALL

We’ve had a performance issue after our last release with one insert statement – the plan was not the one we were expecting – which we decided to address by creating a SQL profile from the same statement’s “good” plan we’ve got in UAT environment. We’ve hit a problem though – the profile wasn’t kicking in, and our DBA suggested that the reason for that was that the SQL we used as the source of the outline information was missing the actual outline information. Oracle keeps the outline information in V$SQL_PLAN.OTHER_XML in a outline_data section. I decided to check out whether there were any other SQLs which missing outline_data section:

SQL> select count(*)
  2  from   (select p.*
  3                ,value(d) info
  4          from   dba_hist_sql_plan p
  5                ,table(xmlsequence(extract(xmltype(p.other_xml),
  6                                           '/other_xml/outline_data')))(+) d
  7          where  other_xml is not null)
  8  where  info is null;

  COUNT(*)
----------
        82

SQL>

So there were a few of those. What caught my eye though when I checked the actual query plans was the operations which were missing the outlines:

SQL> select unique operation
  2  from   (select p.*
  3                ,value(d) info
  4          from   dba_hist_sql_plan p
  5                ,table(xmlsequence(extract(xmltype(p.other_xml),
  6                                           '/other_xml/outline_data')))(+) d
  7          where  other_xml is not null)
  8  where  info is null;

OPERATION
------------------------------
LOAD TABLE CONVENTIONAL
INDEX BUILD
MULTI-TABLE INSERT
PX COORDINATOR

SQL>

Notice multi-table insert (which was exactly our case) – for some reason Oracle is not creating (or populating) the outline section when you are doing multi-table insert, at least not on 11.2.0.3 I’m using. I’ve also checked whether there are any multi-table inserts which have got outline_data and as expected have found none.

Wednesday, December 12, 2012

11g Group By Selectivity Improvements

Oracle 11g (at least 11.2.0.3 where I'm testing it) has brought us an interesting improvement to selectivity of group by statement. Here's a little demo:

SQL> CREATE TABLE t1 AS 
  2    SELECT LEVEL AS id1, 
  3           MOD(LEVEL, 20) fil1, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

SQL> CREATE TABLE t2 AS 
  2    SELECT LEVEL AS id2, 
  3           MOD(LEVEL, 5) fil2, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

REM <<< Gather statistics without histograms >>>

At this point we've got 2 tables with 10k rows in each. The columns of interest are fil1 and fil2 with selectivity 1/20 and 1/5 respectively. Now let's run a query as follows:

SQL> set autotrace traceonly 
SQL> SELECT 
  2   t1.id1 
  3    FROM t1, 
  4         t2 
  5   WHERE t2.id2 = t1.id1 
  6     AND t1.fil1 = 1 
  7     AND t2.fil2 = 1 
  8   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------

As you can see above Oracle has estimated the HASH GROUP BY as returning 125 rows even though in reality it is 500 (which you would get say in 10g).

What I think is happening in this particular case(1) (and I empirically proved it by playing around with selectivities of fil1 and fil2 columns) is that Oracle applies selectivities of fil1 and fil2 columns to the result of HASH JOIN (id=2) in the following way:

GROUP BY CARD = JOIN CARD * SEL(t1.fil1) / SEL(t2.fil2)
and if we put numbers in place it would be:
GROUP BY CARD = 500 * 1/20 / (1/5) = 500 / 20 * 5 = 125
If that's the case I'm not sure what's rational behind it. What I do know is that this optimisation can be disabled by setting _optimizer_improve_selectivity to false:

SQL> SELECT 
  2  --+ OPT_PARAM('_optimizer_improve_selectivity' 'false') 
  3   t1.id1 
  4    FROM t1, 
  5         t2 
  6   WHERE t2.id2 = t1.id1 
  7     AND t1.fil1 = 1 
  8     AND t2.fil2 = 1 
  9   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------

I'm not sure though what else is being switched off in this case. Also if you've got primary key's on your id columns, the optimisation is not kicking in:

SQL> alter table t1 add constraint pk_t1 primary key (id1);

Table altered.

SQL> alter table t2 add constraint pk_t2 primary key (id2);

Table altered.

SQL> SELECT t1.id1 
  2    FROM t1, 
  3         t2 
  4   WHERE t2.id2 = t1.id1 
  5     AND t1.fil1 = 1 
  6     AND t2.fil2 = 1 
  7   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------

Obviously this feature can be (and in a case with one of my queries was) a problem when you've got a complex query with a (few) massive group by clause(s) leading to a significant drop in cardinality estimation, and as a result to picking out a suboptimal join method, especially when you are grouping by more then one column. The existence of filters is a mandatory condition for the optimisation to kick in.

Finally I want to mention a bug on MOS #12591379 which says the cardinality of a GROUP BY may drop severely if it's following HASH JOIN with OR condition. The bug is said to be fixed in version 12.1.

(1) I believe my assumption is oversimplified and the actual algorithm is more sophisticated, as at least if you use a value on t2.fil2 which is out of bound of its min-max values Oracle makes a different projection.

Saturday, October 6, 2012

XPath Query Rewrite

In this entry I want to share my experience of tracing down an interesting problem arising from upgrading to Oracle 11g.
After a few days post migration to Oracle 11.2.0.3 of a database I happened to check the alert log and found there the following:

DDE: Problem Key 'ORA 4030' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes

ORA-4030 happens when a server process fails to allocate a portion of PGA memory.

I checked the application and the ETL jobs but nothing seemed to be failing. Therefore I decided to check the PGA memory used by different sessions:

SELECT *
FROM v$process p,
v$session s
WHERE p.addr = s.paddr
ORDER BY p.pga_alloc_mem DESC;

The first line I saw had had 4293618712 bytes in the PGA_ALLOC_MEM column. That process allocated 4Gb of PGA memory which is the maximum it could get out of Linux. Using the session information I isolated the code causing the issue. Turned out it was a massive (500+ lines) SQL statement heavily utilising XMLElement and XMLForest functions.

A quick check on metalink revealed a bug #13477790 matched our situation (and later
confirmed by the Oracle support to be the culprit). It said the problem was exactly due to usage of XML functions. What I couldn't understand from the description is how Oracle managed to get broken the code which was working perfectly on 10g.

So I decided to reproduce the problem. I isolated the code and when I ran it, I noticed that Oracle does not even start returning the rows from the query that was failing. Ergo I decided to trace CBO.

What I found in the 10053 trace file was lots of attempts to transform the query and then dies with ORA-4030.
In parallel I was googling what has changed in 11g in terms of XDB, and overall information about XDB (as I hasn't worked with it), and found a hint NO_XML_QUERY_REWRITE. I googled it and found that XDB has got a thing called XPath rewrite, when Oracle rewrites a query to optimise XML query execution. In my case no rewriting could be useful, as we don't use XDB per se - we just construct XML from plain tables and upload it as clob. So I tried it, and it helped.

Eventually we applied a patch to fix the problem permanently.

Monday, February 22, 2010

Oracle Range Join Cardinality Latest Changes

In this post I’d like to describe the latest changes in the cardinality calculation for joins of tables. All the changes are checked for the Oracle version 11.2.0.1. The version of Oracle 10g which I cite to throughout the passage is 10.2.0.4. The ideas of how it works in Oracle 10g were taken from the book “Cost-Based Oracle Fundamentals” by Jonathan Lewis.

t2.join1 > t1.join1
Let’s take a glance at the next query:

SELECT t1.v1, t2.v1
  FROM t1, t2
 WHERE t1.filter = 1 -- 25 unique values
   AND t2.join1 > t1.join1 -- 40 and 30 unique values correspondingly
   AND t2.filter = 1 -- 50 unique values
;

This is an example from J. Lewis’ book. In the comments you can see the numbers of unique values in conformable columns. Both tables have 10 000 rows, so after filtering t2 has 200 rows, and t1 has 400 rows. The t2.join1 column has 40 unique values from 1 to 40, and the t1.join1 column has correspondingly 30 unique values from 1 to 30. According to the book, Oracle 10g uses static selectivity 5% to calculate the cardinality of this query. Here’s the execution plan:

------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  | 
------------------------------------------------------------ 
|   0 | SELECT STATEMENT    |      |  4000 |   132K|   155 | 
|   1 |  MERGE JOIN         |      |  4000 |   132K|   155 | 
|   2 |   SORT JOIN         |      |   200 |  3400 |    77 | 
|*  3 |    TABLE ACCESS FULL| T2   |   200 |  3400 |    30 | 
|*  4 |   SORT JOIN         |      |   400 |  6800 |    78 | 
|*  5 |    TABLE ACCESS FULL| T1   |   400 |  6800 |    30 | 
------------------------------------------------------------ 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
   3 - filter("T2"."FILTER"=1) 
   4 - access(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
       filter(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
   5 - filter("T1"."FILTER"=1)

As you can see, the cardinality is 4000, which is equal to 200 * 400 * 0.05. Now let’s take a look at the execution plan for the same query executed in Oracle 11g:

------------------------------------------------------------ 
| Id  | Operation           | Name | Rows  | Bytes | Cost  | 
------------------------------------------------------------ 
|   0 | SELECT STATEMENT    |      | 49000 |  1626K|    82 | 
|   1 |  MERGE JOIN         |      | 49000 |  1626K|    82 | 
|   2 |   SORT JOIN         |      |   200 |  3400 |    38 | 
|*  3 |    TABLE ACCESS FULL| T2   |   200 |  3400 |    28 | 
|*  4 |   SORT JOIN         |      |   400 |  6800 |    39 | 
|*  5 |    TABLE ACCESS FULL| T1   |   400 |  6800 |    28 | 
------------------------------------------------------------ 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
   3 - filter("T2"."FILTER"=1) 
   4 - access(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
       filter(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
   5 - filter("T1"."FILTER"=1)

As you can see, Oracle 11g thinks it’s going to be 49 000 rows returned which is way much more then 10g’s assumption.
Let me explain how I think Oracle calculates this number and the cardinality for such queries in general.

Boring Math
Let’s take a look at the picture. You can see there two lines which represent unique values in columns t2.join1 (left bold vertical line) and t1.join1 (right bold vertical line). Also you can notice than the high_value in t2.join1 is greater than the high_value in t1.join1, that the low_value of t2.join1 is greater than the low_value in t1.join1, and finally that the low_value in t2.join1 is lower than the high_value in t1.join1. The horizontal intermittent lines, drawn through high_values and low_values, mark out three sectors: A, B, and C. This is the most common case of correlation between low_lalues and high_values I could make up as there’re all three segments. If you move those two lines against each other you can achieve existing of two adjacent sectors or even only one of them at all. As you can foresee, the cardinality in every sector is being calculated separately.

Sector A
Logically, for every t2 value for the sector A there should be picked out every value from t1.join1:
Cardinality of A = (high_value(t2) – high_value(t1)) * num_rows(t1) * num_rows(t2) / num_distinct(t2)
Of course it’s relevant only if the condition high_value(t2) – high_value(t1) > 0 is met.

Sector B
For simplicities sake, I’ve separated calculation of cardinality below the sector A in two different sectors even though its not quite right. Why it’s not, because essentially it’s left to count the cardinality for the only segment high_value(t2) – low_value(t2). On the other hand, intersecting that segment with the t1’s values line we’ll get two sectors: one with common values (sector B), and one without (Sector C). Apparently, for each t2.jon1 value there should be counted some number of t1.join1 values from sector B plus all the t1.join1 values from sector C.
Let’s count the B’s cardinality as if there’s no sector C at all (the number of values for t2.join1 values between high_value(t2) and low_value(t2) in terms of values from t1.join1 that are in the sector B). In that case for every value of t2.join1 there should be picked out all the values of t1.join1 that are less that the current t2.join1 value. For instance, if high_value(t1) = 5, low_value(t2) = 1, then for t2.join1 = 5 the t1.join1 values are 4, 3, 2, and 1 (there’re four of them). For t2.join1 = 4 there are one fewer of t1.join1 values (three of them), and so forth. Ultimately, we have a descending number of values from t1 for t2’s values with one fewer on each next step. The whole number of t1.join1’s values can be calculated by the formula for summing of arithmetic progressions:
Sn = ((An + A1) / 2) * n
Where An is the nth element of progression, and A1 – first element. In our case A1 always equals to 1, while An = high_value(t1) – low_value(t2). Pay attention, that in that case we are interested in the NUMBER of values between high and low values rather that in the exact values. We need to know how many times every value will be counted during the final cardinality estimation. Correspondingly, n can be calculated in that way:
n = high_value(t1) – low_value(t2)
Knowing all the stuff above we are able to calculate the cardinality for the sector B:
Cardinality on B = (num_rows(t2) / num_distinct(t2)) * (num_rows(t1) / num_distinct(t1)) * (high_value(t1) – low_value(t2)) * (high_value(t1) – low_value(t2) + 1) / 2
that is, the number of rows for every value from t2.join1 for that sector multiply the number of rows for one value for t1.join1 multiply the number of such “entries”, which is the sum of the progression discussed above.

Sector C
If that sector exists, for every t2.join2 value from sector B you should also include all the values of t1.join1 which are below the low_value of t2:
Cardinality of C = (low_value(t2) – low_value(t1)) * (high_value(t1) – low_value(t2) + 1) * num_rows(t2) / num_distinct(t2) * num_rows(t1) / num_distinct(t1)

The Final Cardinality
Finally, let’s assemble the cardinality formula:
Cardinality of Join = Cardinality of A + Cardinality of B + Cardinality of C
To save space, let’s derive the selectivity formula dividing the sum by num_rows(t2) * num_rows(t1):
Selectivity of Join = (high_value(t2) – high_value(t1)) / num_distinct(t2) + 1 / num_distinct(t2)) * 1 / num_distinct(t1)) * (high_value(t1) – low_value(t2)) * (high_value(t1) – low_value(t2) + 1) / 2 + (low_value(t2) – low_value(t1)) * (high_value(t1) – low_value(t2) + 1) * 1 / num_distinct(t2) * 1 / num_distinct(t1)
After beautifying and generalizing we can get the final formula:
Cardinality of join = Filtered cardinality (t1) * Filtered Cardinality (t2) * Selectivity of Join
Selectivity of Join = (A + (B + C) / num_distinct(t1)) / num_distinct(t2)
A = high_value(t2) – high_value(t1), where high_value(t2) > high_value(t1). Otherwise A = 0
B = (least(high_value(t2), high_value(t1)) – great(low_value(t1), low_value(t2)) * (least(high_value(t2), high_value(t1)) – great(low_value(t1), low_value(t2)) + 1) / 2
C = (great(low_value(t1), low_value(t2)) – least(low_value(t1), low_value(t2))) * (least(high_value(t2), high_value(t1)) - great(low_value(t1), low_value(t2)) + 1), where low_value(t2) > low_value(t1). Otherwise C = 0
t2 is the table which is on the left side of the sigh “>”, and t1 is the table on the right side correspondingly

The Example
Let’s get back to the example in the beginning of the article and perform manual calculation of cardinality:
A = 40 – 30 = 10
B = (30 – 1) * (30 – 1 + 1) / 2 = 29 * 30 / 2 = 435
C = 0
Selectivity of Join = (10 + 435 / 30) / 40 = 0.6125
Cardinality of Join = 200 * 400 * 0.6125 = 49 000

t2.join1 >= t1.join1
I can’t finish this article without bringing up the “>=” range join. Essentially, addition of the equality sign should add num_rows(t2) / num_distinct(t2) * num_rows(t1) / num_distinct(t1) rows into the final cardinality. If you want not to forget about it, you can change the formula and add 1 to A if you have the equality sign. For instance, let’s apply it to the test query:
A = 40 – 30 + 1 = 11
B = … = 435
C = 0
Selectivity of Join = (11 + 435 / 30) / 40 = 0.6375
Cardinality of Join = 200 * 400 * 0.6375 = 51 000
Here’s the execution plan:

------------------------------------------------------------- 
| Id  | Operation            | Name | Rows  | Bytes | Cost  | 
------------------------------------------------------------- 
|   0 | SELECT STATEMENT     |      | 51000 |  1693K|    83 | 
|   1 |  WINDOW BUFFER       |      | 51000 |  1693K|    83 | 

|   2 |   MERGE JOIN         |      | 51000 |  1693K|    83 | 
|   3 |    SORT JOIN         |      |   200 |  3400 |    38 | 
|*  4 |     TABLE ACCESS FULL| T2   |   200 |  3400 |    28 | 
|*  5 |    SORT JOIN         |      |   400 |  6800 |    39 | 
|*  6 |     TABLE ACCESS FULL| T1   |   400 |  6800 |    28 | 
------------------------------------------------------------- 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
   4 - filter("T2"."FILTER"=1) 
   5 - access(INTERNAL_FUNCTION("T2"."JOIN1")>=INTERNAL_FUNCTION("T1"."JOIN1")) 
       filter(INTERNAL_FUNCTION("T2"."JOIN1")>=INTERNAL_FUNCTION("T1"."JOIN1")) 
   6 - filter("T1"."FILTER"=1)

That is all I wanted to cover in this article. In a next one I am going to cover the between operation.

Wednesday, January 27, 2010

How CBO calculates cardinality (Part 2)

This is the continuation of my previous post which you can read here.

column > (<) const

According to the book the cardinality for such a condition can be calculated with the formula below:

Cardinality = num_rows * (needed_range_length / total_range_length + 1 / num_dist*)

* in case if the range is closed (>= or <=)

For example lets assume we have a condition column > 3 and high_value = 12 in that column. In that case the needed_range_length will be high_value – const = 12 – 3 = 9. The total_range_length is being calculated as high_value – low_value from user_tab_col_statistic. In case when we have the “>=” sign, we have have to add 1 / num_distinct.

The formula above is valid for a constant value between the lowest and highest values. In case we use an out-of-range value for the constant the cardinality can be calculated in two ways depending on a condition which is easier to explain on an example:

const < low_value and column < const or const > high_value and column > const
const < low_value and column > const or const > high_value and column < const

In case with the first example, everything is quite obvious, the cardinality will be equal to num_rows. The second case on the other hand is a bit more interesting. Apparently the cardinality of such a query should be (from the humane point of view) equal to 0, but CBO uses the same “wearing off” formula as it does in case with the condition column = const where const is out of range. In other words, for the second example CBO will calculate cardinality as if it’s the equality condition (column = const), which I described in the previous post.

In order to finish the description of inequalities it’s left to cover only the case with bind variables. Here’s the formula for that purpose:

Cardinality = num_rows / num_dist + 1

So it’s the cardinality of condition column = :1 but increased by 1.

column between Am and An

The between condition is an extended inequality condition, but instead of high_value (or low_value depending on the sign) CBO uses Am and An to calculate needed_range_length. Here’s the basic formula for such a condition:

Cardinality = num_rows * ((An - Am) / (high_value – low_value) + 1 / num_dist)

It’s valid for An > Am and both An and Am are within the range between low and high values from user_tab_col_statistic. But what if either Am or An is out of the range? The answer is simple (at least for Oracle 10.2.0.4 and above where I tested it). If the range Am-An has common elements with the range low_value - high_value, than CBO calculates the cardinality using the same formula having substituted an out-of-range edge (either An or Am) with corresponding edge value (high_value if An is out of bound or low_value if Am if out bound) from user_tab_col_statistic. In case if these two ranges have nothing in common, Oracle results the cardinality as the condition is column = :1 (equal to num_rows / num_dist).

Now let’s take a look at bind variables. What would be the cardinality for the condition column between :1 and :2? According to Lewis Oracle prior version 10.2 takes 0.0025 of num_rows and uses the rounded result as the cardinality. Oracle 10.2.0.4 does something similar but not exactly:

Cardinality = num_rows / (num_dist * num_dist) + 1

Finally, here’s the formula for Oracle 11.2.0.1:

Cardinality = num_rows / num_dist + 1

column > const AND column <= const

In case with constants both versions (10.2.0.4 and 11.2.0.1) calculate cardinality in the same way (at least with the same result). Both of them have realized that no rows should be returned and therefore added the filter NULL is NOT NULL into the plans.

But once you substitute the constant with a bind variable everything turns upside down. Neither of databases realizes that the result should be no rows at all. Moreover, 11g calculates cardinality in usual for it way as num_rows / num_dist + 1, whilst 10g does usual for it thing and calculates the cardinality as num_rows / (num_dist * num_dist) + 1.

Friday, December 11, 2009

How CBO calculates cardinality (Part 1)

I’ve been rereading a Jonathan Lewis’ book about CBO internals and testing the stuff on Oracle 10.2.0.4 and 11.2.0.1 and I've found out that Oracle has changed some basic formulas in cardinality calculation. I'd like to share some updated formulas below.

column = const

First of all I'd like to recap how CBO calculates the cardinality of a query with simple condition when column equals to a constant. As you may know the result depends upon the value of constant, that is whether it's between user_tab_col_statistics' high_value and low_value or not. If it is (as well as if you have a bind variable instead of a constant), the cardinality would be calculated by something similar to the next formula:

Cardinality = user_tables.num_rows / user_tab_col_statistics.num_distinct

However if it's not then the cardinality descends linearly depending on the "distance" to the range above. A distance can be calculated by next formulas:

Distance = const - user_tab_col_statistics.high_value + 1 if const > user_tab_col_statistics.high_value or

Distance = user_tab_col_statistics.low_value - const + 1 if const < user_tab_col_statistics.low_value

Now we can figure out the cardinality for any value of the constant outside the range:

Cardinality = num_rows /num_distinct * (1 - (Distance - num_distinct) / (high_value - low_value))

Pay attention that if you've got the result just a bit higher than 0, CBO probably will apply the ceil function to such a result.

column in (c1, c2, ..., cN)

The cardinality of a query with such a predicate can be calculated (according to the book) by simple summing of cardinalities of every member of the in-list unless the sum reached the num_rows value:

Cardinality = Cardinality(c1) + Cardinality(c2) + ... + Cardinality(cN) if the sum < num_rows and

Cardinality = num_rows otherwise

column not in (c1, c2, ..., cN)

This one was tough. The problem is that the cardinality of not-in-list doesn't equal to num_rows minus the cardinality of in-list. Furthermore, I've noticed that the result depends upon the number of members within the parentheses: the more members you have, the less every of them will influence the cardinality. The general formula I've derived looks like this:

Cardinality = num_rows - sum(Cardinality(cN) * power((high_value - low_value) / num_distinct , N - 1)) where every Cardinality(cN) whould be calculated by the formulas above

I'd like to mention a thing about that formula. As you can notice the influence of cardinality of every member depends upon its order number within the list (the bigger number, the less influence). It can affect your calculations due to different cardinalities for constants within the high-low range and outside it. The number of tests showed that apparently before final calculation CBO sorts the cardinalities of every member descendingly (don't forget that bind variables in that case are similar to the values within the range).