Showing posts with label 11g. Show all posts
Showing posts with label 11g. Show all posts

Friday, January 24, 2014

Insert Statement Re-starts on Cursor Invalidation

Here’s a little problem I’ve recently stumbled across when was writing a piece of ETL functionality. The problem is that Oracle restarts an Insert statement over and over when certain criteria is met. Although the full conditions list is known only to Oracle engineers, here are those I was able to deterministically reproduce the issue on Oracle 11.2.0.3 (Linux x64):

  • The destination table (one to be inserting into) is partitioned
  • The source is a join of at least 2 tables one of which is partitioned (but I believe it doesn’t have to be)
  • The cursor gets invalidated while join is still running

Now the last bit requires a bit of elaboration. In my testbed I managed to reproduce it on a 2 tables join and the invalidation had had to be invalidated when Oracle was reading the first table. This is the reason why the tables are largish.

If the conditions above are met and while Oracle is reading the first table I issued a DDL leading to the cursor invalidation, my Insert statement restarted. If I invalidated the cursor again, it would restart again, and that could be continued till the cows come home.

Here’s a test script. First let’s prepare the tables:

1:  create table t1 partition by range (dt) interval (interval'1'day) subpartition by list (slot)  
2:  subpartition template (subpartition sp1 values (1)) (partition p1 values less than (date '2014-01-01'))  
3:  as  
4:  with gen as (select null from dual connect by level <= 1e3)  
5:  select rownum as id, date '2014-01-10' as dt, 1 as slot, rpad(trunc(rownum / 10), 500) as key, rpad('x', 1000) padding from gen, gen where rownum <= 1e6;  
6:    
7:  create table t2 as  
8:  with gen as (select null from dual connect by level <= 1e3)  
9:  select rownum as id, rpad(rownum, 500) as key, rpad('x', 1000) padding from gen, gen where rownum <= 5e5;  
10:     
11:  create table t3 partition by range(dt) interval(interval'1'day) (partition values less than (date '2013-12-14'))  
12:  as select trunc(sysdate) as dt, t2.* from t2 where 1=0;  
13:     
14:  create table t4 as select * from t3 where 1 = 0;  

The first and the second tables would be our source of data in the insert statement. The T3 table would be the destination. The T4 table would be an alternative destination which is not partitioned.

Now open two sessions. Write down the SID of the first one. In the first session (the SID of which you’ve written down) prepare but don’t execute the insert statement as follows:

1:  insert all into t3 (dt, id, key, padding) values (dt, id, key, padding)  
2:    select /*+ monitor*/t1.dt, t1.id, t1.key, t1.padding from t1,t2  
3:    where dt = date '2014-01-10' and slot = 1 and t2.key = t1.key;  

In the second session prepare but again don’t execute the following block:

1:  declare  
2:    pname varchar2(30);  
3:  begin  
4:    select partition_name into pname from user_tab_partitions where table_name = 'T1' and partition_name != 'P1';  
5:    for j in 2..1000 loop  
6:     execute immediate utl_lms.format_message('alter table t1 modify partition %s add subpartition %s values (%s)'  
7:      ,pname, pname||'_SP'||j, to_char(j));  
8:     dbms_lock.sleep(5);  
9:    end loop;  
10:  end;  
11:  /  

That block adds a new empty sub-partition to the current partition, which in turn invalidates all the cursors referencing the table in any way. It does it over and over again with a 5 seconds interval so the cursor gets invalidated many times before it can finish reading T2 (first row source in the join).

Run the insert statement in the first session and within 5 sec interval run the pl/sql block in the second session.

Finally, you can sit and monitor v$sql_monitor using the following query having replaced SID with the one you’ve wrote down:

1:  select key, status, sid, sql_id from v$sql_monitor where (sid, sql_id) in (select sid, sql_id from v$session where sid=<YOURSID>);  

Depending on how much resources you've got to run the query, you should see over time something like this:

1:  SQL> select invalidations from v$sql where sql_id = (select sql_id from v$session where sid = 328);  
2:     
3:  INVALIDATIONS  
4:  -------------  
5:        2  
6:     
7:  SQL> select key, status, sid, sql_id from v$sql_monitor where sql_id = (select sql_id from v$session where sid  
8:  = 328);  
9:     
10:      KEY STATUS           SID SQL_ID  
11:  ---------- ------------------- ---------- -------------  
12:  4.7245E+10 EXECUTING         328 6245n79k73xf7  
13:  1.5720E+12 DONE            328 6245n79k73xf7  
14:  3.2169E+12 DONE            328 6245n79k73xf7  
15:     
16:  SQL>  

If you replace, however, the destination table to T4, no restarts will happen (even though the cursor is still being invalidated).

Oracle says the behaviour is expected if the insert is being done on partitioned table (Doc ID 1462003.1).

Wednesday, December 12, 2012

11g Group By Selectivity Improvements

Oracle 11g (at least 11.2.0.3 where I'm testing it) has brought us an interesting improvement to selectivity of  group by statement. Here's a little demo:
SQL> CREATE TABLE t1 AS 
  2    SELECT LEVEL AS id1, 
  3           MOD(LEVEL, 20) fil1, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

SQL> CREATE TABLE t2 AS 
  2    SELECT LEVEL AS id2, 
  3           MOD(LEVEL, 5) fil2, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

REM <<< Gather statistics without histograms >>>
At this point we've got 2 tables with 10k rows in each. The columns of interest are fil1 and fil2 with selectivity 1/20 and 1/5 respectively. Now let's run a query as follows:
SQL> set autotrace traceonly 
SQL> SELECT 
  2   t1.id1 
  3    FROM t1, 
  4         t2 
  5   WHERE t2.id2 = t1.id1 
  6     AND t1.fil1 = 1 
  7     AND t2.fil2 = 1 
  8   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
As you can see above Oracle has estimated the HASH GROUP BY as returning 125 rows even though in reality it is 500 (which you would get say in 10g).
What I think is happening in this particular case(1) (and I empirically proved it by playing around with selectivities of fil1 and fil2 columns) is that Oracle applies selectivities of fil1 and fil2 columns to the result of HASH JOIN (id=2) in the following way:
GROUP BY CARD = JOIN CARD * SEL(t1.fil1) / SEL(t2.fil2) 
and if we put numbers in place it would be:
GROUP BY CARD = 500 * 1/20 / (1/5) = 500 / 20 * 5 = 125 
If that's the case I'm not sure what's rational behind it. What I do know is that this optimisation can be disabled by setting _optimizer_improve_selectivity to false:
SQL> SELECT 
  2  --+ OPT_PARAM('_optimizer_improve_selectivity' 'false') 
  3   t1.id1 
  4    FROM t1, 
  5         t2 
  6   WHERE t2.id2 = t1.id1 
  7     AND t1.fil1 = 1 
  8     AND t2.fil2 = 1 
  9   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
I'm not sure though what else is being switched off in this case. Also if you've got primary key's on your id columns, the optimisation is not kicking in:
SQL> alter table t1 add constraint pk_t1 primary key (id1);

Table altered.

SQL> alter table t2 add constraint pk_t2 primary key (id2);

Table altered.

SQL> SELECT t1.id1 
  2    FROM t1, 
  3         t2 
  4   WHERE t2.id2 = t1.id1 
  5     AND t1.fil1 = 1 
  6     AND t2.fil2 = 1 
  7   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
Obviously this feature can be (and in a case with one of my queries was) a problem when you've got a complex query with a (few) massive group by clause(s) leading to a significant drop in cardinality estimation, and as a result to picking out a suboptimal join method, especially when you are grouping by more then one column. The existence of filters is a mandatory condition for the optimisation to kick in.
Finally I want to mention a bug on MOS #12591379 which says the cardinality of a GROUP BY may drop severely if it's following HASH JOIN with OR condition. The bug is said to be fixed in version 12.1.

(1) I believe my assumption is oversimplified and the actual algorithm is more sophisticated, as at least if you use a value on t2.fil2 which is out of bound of its min-max values Oracle makes a different projection.

Saturday, October 6, 2012

XPath Query Rewrite

In this entry I want  to share my experience of tracing down an interesting problem arising from upgrading to Oracle 11g.
After a few days post migration to Oracle 11.2.0.3 of a database I happened to check the alert log and found there the following:

DDE: Problem Key 'ORA 4030' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes

ORA-4030 happens when a server process fails to allocate a portion of PGA memory.
I checked the application and the ETL jobs but nothing seemed to be failing. Therefore I decided to check the PGA memory used by different sessions:

SELECT *
  FROM v$process p,
       v$session s
 WHERE p.addr = s.paddr
 ORDER BY p.pga_alloc_mem DESC;


The first line I saw had had 4293618712 bytes in the PGA_ALLOC_MEM column. That process allocated 4Gb of PGA memory which is the maximum it could get out of Linux. Using the session information I isolated the code causing the issue. Turned out it was a massive (500+ lines) SQL statement heavily utilising XMLElement and XMLForest functions. 

A quick check on metalink revealed a bug #13477790 matched our situation (and later 
confirmed by the Oracle support to be the culprit). It said the problem was exactly due to usage of XML functions. What I couldn't understand from the description is how Oracle managed to get broken the code which was working perfectly on 10g.

So I decided to reproduce the problem. I isolated the code and when I ran it, I noticed that Oracle does not even start returning the rows from the query that was failing. Ergo I decided to trace CBO.

What I found in the 10053 trace file was lots of attempts to transform the query and then dies with ORA-4030.
In parallel I was googling what has changed in 11g in terms of XDB, and overall information about XDB (as I hasn't worked with it), and found a hint NO_XML_QUERY_REWRITE. I googled it and found that XDB has got a thing called XPath rewrite, when Oracle rewrites a query to optimise XML query execution. In my case no rewriting could be useful, as we don't use XDB per se - we just construct XML from plain tables and upload it as clob. So I tried it, and it helped. 

Eventually we applied a patch to fix the problem permanently.



Friday, August 27, 2010

Data Change Tracking via the Streams

Recently I’ve got an assignment to implement a facility which would capture the changes in a table once it’s appeared there and propagate it further on through EMS to the subscribers. Here I’d like to describe the way we implemented such a mechanism.

We’ve considered a bunch of solutions we found useful like Change Notification, Change Data Capture, Trigger based mechanism, etc., but decided to stop on the Streams as it was the winner in correlation between speed, maintainability, and resource consummation.

In a nutshell


Capture catches the changes satisfying the basic table filter rules. Afterwards it pushes them down to Apply which uses manually written DML Handler. The Handler is written in the way, when it takes LCR, extracts the necessary information (like PK ID or so), piles it up into a user defined queue, and simply ignores the LCR without any further processing. In case with more complicated structures (which is the case for me), I put the caught changes into an interim cache collection and publish them into the queue using a pre-commit handler (also manually written).

The paragraph above might seem a tad mouthful for a person who isn’t really into the Streams, ergo see further example to comprehend (if you’re interested).

Preparation


First of all the schema you want to the Streams code to work under must be a granted with Streams administrator permissions:
BEGIN
  dbms_streams_auth.grant_admin_privilege(grantee => 'SCHEMA_NAME');
END;
/
Secondly the user must be granted with the DBA role explicitly. Once it’s done, we can start setting up the test environment.

Now let’s create the test table the changes in which we’re going to catch:
CREATE TABLE t1
(
  id NUMBER(15) PRIMARY KEY pk_t1,
  f1 NUMBER(15) NOT NULL,
  padding rpad('x', 500)
);
I’m going to show you how to catch all the unique values in the f1 field for the table above. I will publish all the f1 values into the queue below:
BEGIN
  dbms_aqadm.create_queue_table(queue_table        => 'ct_t1_tbl',
                                queue_payload_type => 'sys.aq$_jms_text_message',
                                multiple_consumers => FALSE);
  dbms_aqadm.create_queue(queue_name  => 'ct_t1',
                          queue_table => 'ct_t1_tbl');
  dbms_aqadm.start_queue(queue_name => 'ct_t1');

  COMMIT;
END;
/
The messages from that queue can be picked up from within either Oracle or Java code.

The Queue


The Streams uses a (buffered) queue to pass the changes over between capture and apply. Let’s create it:
BEGIN
  dbms_streams_adm.set_up_queue(queue_table => 'STRM_QT',
                                queue_name  => 'STRM_Q',
                                queue_user  => 'STREAMS_ADMIN', -- put here your user
                                COMMENT     => 'The Streams queue');
END;
/

The Capture

Before we create the capture, we need to instantiate the table:
BEGIN
  dbms_capture_adm.prepare_table_instantiation(table_name => 'T1');
END;
/
The step above is necessary. Oracle scans the table and jots down the SCN starting from which it can capture the changes. Also it will automatically add supplemental log groups for the PK. To check if the table is capture ready examine the dictionary dba_capture_prepared_tables.

As we’re going to capture the f1 field which isn’t present in the PK we have to add manually a supplemental log group for that column so it would appear in the redo log every time DML occurs against the table:
ALTER TABLE T1 ADD SUPPLEMENTAL LOG GROUP GR1 (F1) ALWAYS;

To check if the column is being logged, examine the views user_log_groups and user_log_group_columns.

Now we can create the capture having built the logminer dictionary before of course:
DECLARE
  l_first_scn NUMBER;
BEGIN
  dbms_capture_adm.build(l_first_scn);
  dbms_capture_adm.create_capture(queue_name                => 'STRM_Q',
                                  capture_name              => 'CPT01',
                                  start_scn                 => l_first_scn,
                                  first_scn                 => l_first_scn,
                                  checkpoint_retention_time => 1);
  dbms_streams_adm.add_table_rules(table_name   => 'T1',
                                         streams_type => 'CAPTURE',
                                         streams_name => 'CPT01',
                                         queue_name   => 'STRM_Q',
                                         include_dml  => TRUE,
                                         include_ddl  => FALSE);
END;
/

The DML and Pre-Commit Handlers


The whole peculiar logic is concluded in the next package:
CREATE OR REPLACE PACKAGE strms IS
  PROCEDURE dml_handler(p_lcr IN sys.anydata);
  PROCEDURE precommit_handler(commit_number IN NUMBER);
END strms;
/

CREATE OR REPLACE PACKAGE BODY strms IS

  -- the cache structure for LCRs
  TYPE t_cache IS TABLE OF VARCHAR2(1) INDEX BY PLS_INTEGER;
  l_cache t_cache;

  ---------------------------------------------------------------------------------------------------

  PROCEDURE publish2aq
  (
    p_entity_id IN NUMBER,
    p_msg       IN VARCHAR2
  ) IS
    l_msg           sys.aq$_jms_text_message;
    l_queue_options dbms_aq.enqueue_options_t;
    l_msg_props     dbms_aq.message_properties_t;
    l_queue_name    VARCHAR2(64) := 'STRM_Q';
    msg_id          RAW(16);
  BEGIN
    -- Ensure what is sent will be a JMS message
    l_msg := sys.aq$_jms_text_message.construct();
  
    l_msg.set_string_property('entity', p_entity_id);
    l_msg.set_text(p_msg);
  
    dbms_aq.enqueue(queue_name         => l_queue_name,
                    enqueue_options    => l_queue_options,
                    message_properties => l_msg_props,
                    payload            => l_msg,
                    msgid              => msg_id);
  END publish2aq;

  ---------------------------------------------------------------------------------------------------

  PROCEDURE dml_handler(p_lcr IN sys.anydata) IS
    lcr sys.lcr$_row_record;
    rc  PLS_INTEGER;
  
    l_owner           VARCHAR2(30);
    l_object          VARCHAR2(30);
    l_entity          NUMBER(15);
    l_id              NUMBER(15);
    l_table_id        NUMBER(15);
    l_hierarchy_level NUMBER(2);
    l_operation       VARCHAR2(1);
    res               PLS_INTEGER;
  BEGIN
    rc := p_lcr.getobject(lcr);
  
    l_operation := substr(lcr.get_command_type(), 1, 1);
    l_owner     := lcr.get_object_owner();
    l_object    := lcr.get_object_name();
  
    -- if the proc has fired against another object, just returning
    IF l_owner != 'STERAMS_ADMIN' OR l_object != 'T1' THEN
      RETURN;
    END IF;
  
    res := sys.anydata.getnumber(lcr.get_value(value_type  => CASE l_operation WHEN 'D' THEN 'OLD' ELSE 'NEW' END,
                                               column_name => 'F1'),
                                 l_id);
  
    -- Putting into the cache
    IF NOT l_cache.exists(l_id) THEN
      -- in my case I store the operation performed on the row. In case when within a transaction
      -- it was performed couple of operations against one row I calculated the result operation.
      -- This example isn't supposed to be so complicated, so I haven't included such code
      l_cache(l_id) := l_operation;
    END IF;
  
  END dml_handler;

  ---------------------------------------------------------------------------------------------------

  PROCEDURE precommit_handler(commit_number IN NUMBER) IS
    idx PLS_INTEGER;
  BEGIN
    -- If there's nothing to publish just returning
    IF l_cache.count = 0 THEN
      RETURN;
    END IF;
  
    idx := l_cache.first;
    WHILE idx IS NOT NULL LOOP
      -- May be any other publishing way
      publish2aq(idx, idx);
      idx := l_cache.next(idx);
    END LOOP;
  
    l_cache.delete;
  
  END precommit_handler;

END strms;
/

The Apply


Now let’s create the apply process and tie it with the handlers:
BEGIN
  dbms_apply_adm.create_apply(queue_name     => 'STRM_Q',
                              apply_name     => 'APL01',
                              apply_user     => 'STERAMS_ADMIN',
                              apply_captured => TRUE);
  dbms_apply_adm.alter_apply(apply_name        => 'APL01',
                             precommit_handler => 'STERAMS_ADMIN..strms.PRECOMMIT_HANDLER');
  dbms_apply_adm.set_parameter(apply_name => 'APL01',
                               parameter  => 'disable_on_error',
                               VALUE      => 'N');
  FOR i IN 1 .. 4 LOOP
    dbms_apply_adm.set_dml_handler(object_name    => 'T1',
                                   object_type    => 'TABLE',
                                   operation_name => CASE i WHEN 1 THEN 'INSERT'
                                                            WHEN 2 THEN 'UPDATE'
                                                            WHEN 3 THEN 'DELETE'
                                                            ELSE 'LOB_UPDATE' END,
                                   user_procedure => 'STREAMS_ADMIN..strms.DML_HANDLER',
                                   assemble_lobs  => TRUE);
  END LOOP;
END;
/
It’s left only to start the Streams:
BEGIN
  dbms_capture_adm.start_capture(capture_name => 'CPT01');
  dbms_apply_adm.start_apply(apply_name => 'APL01');
END;
/

Monday, February 22, 2010

Oracle Range Join Cardinality Latest Changes

In this post I’d like to describe the latest changes in the cardinality calculation for joins of tables.  All the changes are checked for the Oracle version 11.2.0.1.  The version of Oracle 10g which I cite to throughout the passage is 10.2.0.4. The ideas of how it works in Oracle 10g were taken from the book “Cost-Based Oracle Fundamentals” by Jonathan Lewis.

t2.join1 > t1.join1
Let’s take a glance at the next query:
SELECT t1.v1, t2.v1
  FROM t1, t2
 WHERE t1.filter = 1 -- 25 unique values
   AND t2.join1 > t1.join1 -- 40 and 30 unique values correspondingly
   AND t2.filter = 1 -- 50 unique values
;
This is an example from J. Lewis’ book. In the comments you can see the numbers of unique values in conformable columns. Both tables have 10 000 rows, so after filtering t2 has 200 rows, and t1 has 400 rows. The t2.join1 column has 40 unique values from 1 to 40, and the t1.join1 column has correspondingly 30 unique values from 1 to 30. According to the book, Oracle 10g uses static selectivity 5% to calculate the cardinality of this query. Here’s the execution plan:
------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  | 
------------------------------------------------------------ 
|   0 | SELECT STATEMENT    |      |  4000 |   132K|   155 | 
|   1 |  MERGE JOIN         |      |  4000 |   132K|   155 | 
|   2 |   SORT JOIN         |      |   200 |  3400 |    77 | 
|*  3 |    TABLE ACCESS FULL| T2   |   200 |  3400 |    30 | 
|*  4 |   SORT JOIN         |      |   400 |  6800 |    78 | 
|*  5 |    TABLE ACCESS FULL| T1   |   400 |  6800 |    30 | 
------------------------------------------------------------ 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
   3 - filter("T2"."FILTER"=1) 
   4 - access(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
       filter(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
   5 - filter("T1"."FILTER"=1)
As you can see, the cardinality is 4000, which is equal to 200 * 400 * 0.05. Now let’s take a look at the execution plan for the same query executed in Oracle 11g:
------------------------------------------------------------ 
| Id  | Operation           | Name | Rows  | Bytes | Cost  | 
------------------------------------------------------------ 
|   0 | SELECT STATEMENT    |      | 49000 |  1626K|    82 | 
|   1 |  MERGE JOIN         |      | 49000 |  1626K|    82 | 
|   2 |   SORT JOIN         |      |   200 |  3400 |    38 | 
|*  3 |    TABLE ACCESS FULL| T2   |   200 |  3400 |    28 | 
|*  4 |   SORT JOIN         |      |   400 |  6800 |    39 | 
|*  5 |    TABLE ACCESS FULL| T1   |   400 |  6800 |    28 | 
------------------------------------------------------------ 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
   3 - filter("T2"."FILTER"=1) 
   4 - access(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
       filter(INTERNAL_FUNCTION("T2"."JOIN1")>INTERNAL_FUNCTION("T1"."JOIN1")) 
   5 - filter("T1"."FILTER"=1)
As you can see, Oracle 11g thinks it’s going to be 49 000 rows returned which is way much more then 10g’s assumption.
Let me explain how I think Oracle calculates this number and the cardinality for such queries in general.

Boring Math
Let’s take a look at the picture. You can see there two lines which represent unique values in columns t2.join1 (left bold vertical line) and t1.join1 (right bold vertical line). Also you can notice than the high_value in t2.join1 is greater than the high_value in t1.join1, that the low_value of t2.join1 is greater than the low_value in t1.join1, and finally that the low_value in t2.join1 is lower than the high_value in t1.join1. The horizontal intermittent lines, drawn through high_values and low_values, mark out three sectors: A, B, and C. This is the most common case of correlation between low_lalues and high_values I could make up as there’re all three segments. If you move those two lines against each other you can achieve existing of two adjacent sectors or even only one of them at all. As you can foresee, the cardinality in every sector is being calculated separately.

Sector A
Logically, for every t2 value for the sector A there should be picked out every value from t1.join1:
Cardinality of A = (high_value(t2) – high_value(t1)) * num_rows(t1) * num_rows(t2) / num_distinct(t2)
Of course it’s relevant only if the condition high_value(t2) – high_value(t1) > 0 is met.

Sector B
For simplicities sake, I’ve separated calculation of cardinality below the sector A in two different sectors even though its not quite right. Why it’s not, because essentially it’s left to count the cardinality for the only segment high_value(t2) – low_value(t2). On the other hand, intersecting that segment with the t1’s values line we’ll get two sectors: one with common values (sector B), and one without (Sector C). Apparently, for each t2.jon1 value there should be counted some number of t1.join1 values from sector B plus all the t1.join1 values from sector C.
Let’s count the B’s cardinality as if there’s no sector C at all (the number of values for t2.join1 values between high_value(t2) and low_value(t2) in terms of values from t1.join1 that are in the sector B). In that case for every value of t2.join1 there should be picked out all the values of t1.join1 that are less that the current t2.join1 value. For instance, if high_value(t1) = 5, low_value(t2) = 1, then for t2.join1 = 5 the t1.join1 values are 4, 3, 2, and 1 (there’re four of them). For t2.join1 = 4 there are one fewer of t1.join1 values (three of them), and so forth. Ultimately, we have a descending number of values from t1 for t2’s values with one fewer on each next step. The whole number of t1.join1’s values can be calculated by the formula for summing of arithmetic progressions:
Sn = ((An + A1) / 2) * n
Where An is the nth element of progression, and A1 – first element. In our case A1 always equals to 1, while An = high_value(t1) – low_value(t2). Pay attention, that in that case we are interested in the NUMBER of values between high and low values rather that in the exact values. We need to know how many times every value will be counted during the final cardinality estimation. Correspondingly, n can be calculated in that way:
n = high_value(t1) – low_value(t2)
Knowing all the stuff above we are able to calculate the cardinality for the sector B:
Cardinality on B = (num_rows(t2) / num_distinct(t2)) * (num_rows(t1) / num_distinct(t1)) * (high_value(t1) – low_value(t2)) * (high_value(t1) – low_value(t2) + 1) / 2
that is, the number of rows for every value from t2.join1 for that sector multiply the number of rows for one value for t1.join1 multiply the number of such “entries”, which is the sum of the progression discussed above.

Sector C
If that sector exists, for every t2.join2 value from sector B you should also include all the values of t1.join1 which are below the low_value of t2:
Cardinality of C = (low_value(t2) – low_value(t1)) * (high_value(t1) – low_value(t2) + 1) * num_rows(t2) / num_distinct(t2) * num_rows(t1) / num_distinct(t1)

The Final Cardinality
Finally, let’s assemble the cardinality formula:
Cardinality of Join = Cardinality of A + Cardinality of B + Cardinality of C
To save space, let’s derive the selectivity formula dividing the sum by num_rows(t2) * num_rows(t1):
Selectivity of Join = (high_value(t2) – high_value(t1)) / num_distinct(t2) + 1 / num_distinct(t2)) * 1 / num_distinct(t1)) * (high_value(t1) – low_value(t2)) * (high_value(t1) – low_value(t2) + 1) / 2 + (low_value(t2) – low_value(t1)) * (high_value(t1) – low_value(t2) + 1) * 1 / num_distinct(t2) * 1 / num_distinct(t1)
After beautifying and generalizing we can get the final formula:
Cardinality of join = Filtered cardinality (t1) * Filtered Cardinality (t2) * Selectivity of Join
Selectivity of Join = (A + (B + C) / num_distinct(t1)) / num_distinct(t2)
A = high_value(t2) – high_value(t1), where high_value(t2) > high_value(t1). Otherwise A = 0
B = (least(high_value(t2), high_value(t1)) – great(low_value(t1), low_value(t2)) * (least(high_value(t2), high_value(t1)) – great(low_value(t1), low_value(t2)) + 1) / 2
C = (great(low_value(t1), low_value(t2)) – least(low_value(t1), low_value(t2))) * (least(high_value(t2), high_value(t1)) - great(low_value(t1), low_value(t2)) + 1), where low_value(t2) > low_value(t1). Otherwise C = 0
t2 is the table which is on the left side of the sigh “>”, and t1 is the table on the right side correspondingly

The Example
Let’s get back to the example in the beginning of the article and perform manual calculation of cardinality:
A = 40 – 30 = 10
B = (30 – 1) * (30 – 1 + 1) / 2 = 29 * 30 / 2 = 435
C = 0
Selectivity of Join = (10 + 435 / 30) / 40 = 0.6125
Cardinality of Join = 200 * 400 * 0.6125 = 49 000

t2.join1 >= t1.join1
I can’t finish this article without bringing up the “>=” range join. Essentially, addition of the equality sign should add num_rows(t2) / num_distinct(t2) * num_rows(t1) / num_distinct(t1) rows into the final cardinality. If you want not to forget about it, you can change the formula and add 1 to A if you have the equality sign. For instance, let’s apply it to the test query:
A = 40 – 30 + 1 = 11
B = … = 435
C = 0
Selectivity of Join = (11 + 435 / 30) / 40 = 0.6375
Cardinality of Join = 200 * 400 * 0.6375 = 51 000
Here’s the execution plan:
------------------------------------------------------------- 
| Id  | Operation            | Name | Rows  | Bytes | Cost  | 
------------------------------------------------------------- 
|   0 | SELECT STATEMENT     |      | 51000 |  1693K|    83 | 
|   1 |  WINDOW BUFFER       |      | 51000 |  1693K|    83 | 

|   2 |   MERGE JOIN         |      | 51000 |  1693K|    83 | 
|   3 |    SORT JOIN         |      |   200 |  3400 |    38 | 
|*  4 |     TABLE ACCESS FULL| T2   |   200 |  3400 |    28 | 
|*  5 |    SORT JOIN         |      |   400 |  6800 |    39 | 
|*  6 |     TABLE ACCESS FULL| T1   |   400 |  6800 |    28 | 
------------------------------------------------------------- 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
   4 - filter("T2"."FILTER"=1) 
   5 - access(INTERNAL_FUNCTION("T2"."JOIN1")>=INTERNAL_FUNCTION("T1"."JOIN1")) 
       filter(INTERNAL_FUNCTION("T2"."JOIN1")>=INTERNAL_FUNCTION("T1"."JOIN1")) 
   6 - filter("T1"."FILTER"=1)

That is all I wanted to cover in this article. In a next one I am going to cover the between operation.

Wednesday, January 27, 2010

How CBO calculates cardinality (Part 2)


This is the continuation of my previous post which you can read here.

column > (<) const
According to the book the cardinality for such a condition can be calculated with the formula below:
Cardinality = num_rows * (needed_range_length / total_range_length + 1 / num_dist*)
* in case if the range is closed (>= or <=)

For example lets assume we have a condition column > 3 and high_value = 12 in that column. In that case the needed_range_length will be high_value – const = 12 – 3 = 9. The total_range_length is being calculated as high_value – low_value from user_tab_col_statistic. In case when we have the “>=” sign, we have have to add 1 / num_distinct.

The formula above is valid for a constant value between the lowest and highest values. In case we use an out-of-range value for the constant the cardinality can be calculated in two ways depending on a condition which is easier to explain on an example:
  1. const < low_value and column < const or const > high_value and column > const
  2. const < low_value and column > const or const > high_value and column < const
In case with the first example, everything is quite obvious, the cardinality will be equal to num_rows. The second case on the other hand is a bit more interesting. Apparently the cardinality of such a query should be (from the humane point of view) equal to 0, but CBO uses the same “wearing off” formula as it does in case with the condition column = const where const is out of range. In other words, for the second example CBO will calculate cardinality as if it’s the equality condition (column = const), which I described in the previous post.

In order to finish the description of inequalities it’s left to cover only the case with bind variables. Here’s the formula for that purpose:
Cardinality = num_rows / num_dist + 1
So it’s the cardinality of condition column = :1 but increased by 1.

column between Am and An
The between condition is an extended inequality condition, but instead of high_value (or low_value depending on the sign) CBO uses Am and An to calculate needed_range_length. Here’s the basic formula for such a condition:
Cardinality = num_rows * ((An - Am) / (high_value low_value) + 1 / num_dist)
It’s valid for An > Am and both An and Am are within the range between low and high values from user_tab_col_statistic. But what if either Am or An is out of the range? The answer is simple (at least for Oracle 10.2.0.4 and above where I tested it). If the range Am-An has common elements with the range low_value - high_value, than CBO calculates the cardinality using the same formula having substituted an out-of-range edge (either An or Am) with corresponding edge value (high_value if An is out of bound or low_value if Am if out bound) from user_tab_col_statistic. In case if these two ranges have nothing in common, Oracle results the cardinality as the condition is column = :1 (equal to num_rows / num_dist).

Now let’s take a look at bind variables. What would be the cardinality for the condition column between :1 and :2? According to Lewis Oracle prior version 10.2 takes 0.0025 of num_rows and uses the rounded result as the cardinality. Oracle 10.2.0.4 does something similar but not exactly:
Cardinality = num_rows / (num_dist * num_dist) + 1
Finally, here’s the formula for Oracle 11.2.0.1:
Cardinality = num_rows / num_dist + 1

column > const AND column <= const
In case with constants both versions (10.2.0.4 and 11.2.0.1) calculate cardinality in the same way (at least with the same result). Both of them have realized that no rows should be returned and therefore added the filter NULL is NOT NULL into the plans.

But once you substitute the constant with a bind variable everything turns upside down. Neither of databases realizes that the result should be no rows at all. Moreover, 11g calculates cardinality in usual for it way as num_rows / num_dist + 1, whilst 10g does usual for it thing and calculates the cardinality as num_rows / (num_dist * num_dist) + 1.

Friday, December 11, 2009

How CBO calculates cardinality (Part 1)

I’ve been rereading a Jonathan Lewis’ book about CBO internals and testing the stuff on Oracle 10.2.0.4 and 11.2.0.1 and I've found out that Oracle has changed some basic formulas in cardinality calculation. I'd like to share some updated formulas below.

column = const
First of all I'd like to recap how CBO calculates the cardinality of a query with simple condition when column equals to a constant. As you may know the result depends upon the value of constant, that is whether it's between user_tab_col_statistics' high_value and low_value or not. If it is (as well as if you have a bind variable instead of a constant), the cardinality would be calculated by something similar to the next formula:

          Cardinality = user_tables.num_rows / user_tab_col_statistics.num_distinct

However if it's not then the cardinality descends linearly depending on the "distance" to the range above. A distance can be calculated by next formulas:

          Distance = const - user_tab_col_statistics.high_value + 1 if const  user_tab_col_statistics.high_value or
          Distance = user_tab_col_statistics.low_value - const 1 if const < user_tab_col_statistics.low_value

Now we can figure out the cardinality for any value of the constant outside the range:

          Cardinality = num_rows /num_distinct * (1 - (Distance - num_distinct)  / (high_value low_value))

Pay attention that if you've got the result just a bit higher than 0, CBO probably will apply the ceil function to such a result.

column in (c1, c2, ..., cN)
The cardinality of a query with such a predicate can be calculated (according to the book) by simple summing of cardinalities of every member of the in-list unless the sum reached the num_rows value:

          Cardinality = Cardinality(c1) + Cardinality(c2) + ... + Cardinality(cN) if the sum < num_rows and
          Cardinality = num_rows otherwise

column not in (c1, c2, ..., cN)
This one was tough. The problem is that the cardinality of not-in-list doesn't equal to num_rows minus the cardinality of in-list. Furthermore, I've noticed that the result depends upon the number of members within the parentheses: the more members you have, the less every of them will influence the cardinality. The general formula I've derived looks like this:

          Cardinality = num_rows sum(Cardinality(cN) * power((high_value low_value) / num_distinct , N - 1)) where every Cardinality(cN) whould be calculated by the formulas above

I'd like to mention a thing about that formula. As you can notice the influence of cardinality of every member depends upon its order number within the list (the bigger number, the less influence). It can affect your calculations due to different cardinalities for constants within the high-low range and outside it. The number of tests showed that apparently before final calculation CBO sorts the cardinalities of every member descendingly (don't forget that bind variables in that case are similar to the values within the range).