Friday, January 24, 2014

Insert Statement Re-starts on Cursor Invalidation

Here’s a little problem I’ve recently stumbled across when was writing a piece of ETL functionality. The problem is that Oracle restarts an Insert statement over and over when certain criteria is met. Although the full conditions list is known only to Oracle engineers, here are those I was able to deterministically reproduce the issue on Oracle 11.2.0.3 (Linux x64):

  • The destination table (one to be inserting into) is partitioned
  • The source is a join of at least 2 tables one of which is partitioned (but I believe it doesn’t have to be)
  • The cursor gets invalidated while join is still running

Now the last bit requires a bit of elaboration. In my testbed I managed to reproduce it on a 2 tables join and the invalidation had had to be invalidated when Oracle was reading the first table. This is the reason why the tables are largish.

If the conditions above are met and while Oracle is reading the first table I issued a DDL leading to the cursor invalidation, my Insert statement restarted. If I invalidated the cursor again, it would restart again, and that could be continued till the cows come home.

Here’s a test script. First let’s prepare the tables:

1:  create table t1 partition by range (dt) interval (interval'1'day) subpartition by list (slot)  
2:  subpartition template (subpartition sp1 values (1)) (partition p1 values less than (date '2014-01-01'))  
3:  as  
4:  with gen as (select null from dual connect by level <= 1e3)  
5:  select rownum as id, date '2014-01-10' as dt, 1 as slot, rpad(trunc(rownum / 10), 500) as key, rpad('x', 1000) padding from gen, gen where rownum <= 1e6;  
6:    
7:  create table t2 as  
8:  with gen as (select null from dual connect by level <= 1e3)  
9:  select rownum as id, rpad(rownum, 500) as key, rpad('x', 1000) padding from gen, gen where rownum <= 5e5;  
10:     
11:  create table t3 partition by range(dt) interval(interval'1'day) (partition values less than (date '2013-12-14'))  
12:  as select trunc(sysdate) as dt, t2.* from t2 where 1=0;  
13:     
14:  create table t4 as select * from t3 where 1 = 0;  

The first and the second tables would be our source of data in the insert statement. The T3 table would be the destination. The T4 table would be an alternative destination which is not partitioned.

Now open two sessions. Write down the SID of the first one. In the first session (the SID of which you’ve written down) prepare but don’t execute the insert statement as follows:

1:  insert all into t3 (dt, id, key, padding) values (dt, id, key, padding)  
2:    select /*+ monitor*/t1.dt, t1.id, t1.key, t1.padding from t1,t2  
3:    where dt = date '2014-01-10' and slot = 1 and t2.key = t1.key;  

In the second session prepare but again don’t execute the following block:

1:  declare  
2:    pname varchar2(30);  
3:  begin  
4:    select partition_name into pname from user_tab_partitions where table_name = 'T1' and partition_name != 'P1';  
5:    for j in 2..1000 loop  
6:     execute immediate utl_lms.format_message('alter table t1 modify partition %s add subpartition %s values (%s)'  
7:      ,pname, pname||'_SP'||j, to_char(j));  
8:     dbms_lock.sleep(5);  
9:    end loop;  
10:  end;  
11:  /  

That block adds a new empty sub-partition to the current partition, which in turn invalidates all the cursors referencing the table in any way. It does it over and over again with a 5 seconds interval so the cursor gets invalidated many times before it can finish reading T2 (first row source in the join).

Run the insert statement in the first session and within 5 sec interval run the pl/sql block in the second session.

Finally, you can sit and monitor v$sql_monitor using the following query having replaced SID with the one you’ve wrote down:

1:  select key, status, sid, sql_id from v$sql_monitor where (sid, sql_id) in (select sid, sql_id from v$session where sid=<YOURSID>);  

Depending on how much resources you've got to run the query, you should see over time something like this:

1:  SQL> select invalidations from v$sql where sql_id = (select sql_id from v$session where sid = 328);  
2:     
3:  INVALIDATIONS  
4:  -------------  
5:        2  
6:     
7:  SQL> select key, status, sid, sql_id from v$sql_monitor where sql_id = (select sql_id from v$session where sid  
8:  = 328);  
9:     
10:      KEY STATUS           SID SQL_ID  
11:  ---------- ------------------- ---------- -------------  
12:  4.7245E+10 EXECUTING         328 6245n79k73xf7  
13:  1.5720E+12 DONE            328 6245n79k73xf7  
14:  3.2169E+12 DONE            328 6245n79k73xf7  
15:     
16:  SQL>  

If you replace, however, the destination table to T4, no restarts will happen (even though the cursor is still being invalidated).

Oracle says the behaviour is expected if the insert is being done on partitioned table (Doc ID 1462003.1).

Wednesday, June 12, 2013

SQL Profile for INSERT-ALL

We’ve had a performance issue after our last release with one insert statement – the plan was not the one we were expecting – which we decided to address by creating a SQL profile from the same statement’s “good” plan we’ve got in UAT environment. We’ve hit a problem though – the profile wasn’t kicking in, and our DBA suggested that the reason for that was that the SQL we used as the source of the outline information was missing the actual outline information. Oracle keeps the outline information in V$SQL_PLAN.OTHER_XML in a outline_data section. I decided to check out whether there were any other SQLs which missing outline_data section:
SQL> select count(*)
  2  from   (select p.*
  3                ,value(d) info
  4          from   dba_hist_sql_plan p
  5                ,table(xmlsequence(extract(xmltype(p.other_xml),
  6                                           '/other_xml/outline_data')))(+) d
  7          where  other_xml is not null)
  8  where  info is null;

  COUNT(*)
----------
        82

SQL>
So there were a few of those. What caught my eye though when I checked the actual query plans was the operations which were missing the outlines:
SQL> select unique operation
  2  from   (select p.*
  3                ,value(d) info
  4          from   dba_hist_sql_plan p
  5                ,table(xmlsequence(extract(xmltype(p.other_xml),
  6                                           '/other_xml/outline_data')))(+) d
  7          where  other_xml is not null)
  8  where  info is null;

OPERATION
------------------------------
LOAD TABLE CONVENTIONAL
INDEX BUILD
MULTI-TABLE INSERT
PX COORDINATOR

SQL>
Notice multi-table insert (which was exactly our case) – for some reason Oracle is not creating (or populating) the outline section when you are doing multi-table insert, at least not on 11.2.0.3 I’m using. I’ve also checked whether there are any multi-table inserts which have got outline_data and as expected have found none.

Thursday, April 4, 2013

Why my index is being ignored post hardware upgrade?

Here's a nice case study how a better hardware can make the existing (poor) code perform worse off. Consider the following query:
select /*+ parallel*/ * from t1 where hname = 'RARE VALUE' and id = 2;
The query plan on a slow machine looks like this:
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     2 |    44 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     2 |    44 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX |     2 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
and on the fast like this:
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     2 |    44 |     4   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |     2 |    44 |     4   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |     2 |    44 |     4   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       |     2 |    44 |     4   (0)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------
The difference that matters between the two machines is the number of CPUs. On the old one it is 16 and on the new one 80 (with HT enabled). Therefore the default degree of parallelism on the new machine is 2 (parallel_threads_per_cpu) * 80 (cpu_count) = 160, which reduces the cost of full scan of the table to a value small enough to "beat" an index access, in terms of cost of course. In reality it wasn't as fast as an index access in my case, alas.
In case you want to reproduce, here's a script to create the T1 table:
create table t1 as
   select ceil(dbms_random.value(1, 100000)) as id
         ,case 
             when mod(rownum, 1000) = 0 then cast('RARE VALUE' as varchar2(100))
             when mod(rownum, 3) = 0    then cast('COMMON VALUE 1' as varchar2(100))
             when mod(rownum, 3) = 1    then cast('COMMON VALUE 2' as varchar2(100))
             when mod(rownum, 3) = 2    then cast('COMMON VALUE 3' as varchar2(100))
          end hname
         ,'x' value
   from dual connect by level <= 500000;

<>

create index t1_idx on t1 (id, hname) pctfree 50;
The plans above were created on Oracle 11.2.0.3 on Linux x64

Wednesday, December 12, 2012

11g Group By Selectivity Improvements

Oracle 11g (at least 11.2.0.3 where I'm testing it) has brought us an interesting improvement to selectivity of  group by statement. Here's a little demo:
SQL> CREATE TABLE t1 AS 
  2    SELECT LEVEL AS id1, 
  3           MOD(LEVEL, 20) fil1, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

SQL> CREATE TABLE t2 AS 
  2    SELECT LEVEL AS id2, 
  3           MOD(LEVEL, 5) fil2, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

REM <<< Gather statistics without histograms >>>
At this point we've got 2 tables with 10k rows in each. The columns of interest are fil1 and fil2 with selectivity 1/20 and 1/5 respectively. Now let's run a query as follows:
SQL> set autotrace traceonly 
SQL> SELECT 
  2   t1.id1 
  3    FROM t1, 
  4         t2 
  5   WHERE t2.id2 = t1.id1 
  6     AND t1.fil1 = 1 
  7     AND t2.fil2 = 1 
  8   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
As you can see above Oracle has estimated the HASH GROUP BY as returning 125 rows even though in reality it is 500 (which you would get say in 10g).
What I think is happening in this particular case(1) (and I empirically proved it by playing around with selectivities of fil1 and fil2 columns) is that Oracle applies selectivities of fil1 and fil2 columns to the result of HASH JOIN (id=2) in the following way:
GROUP BY CARD = JOIN CARD * SEL(t1.fil1) / SEL(t2.fil2) 
and if we put numbers in place it would be:
GROUP BY CARD = 500 * 1/20 / (1/5) = 500 / 20 * 5 = 125 
If that's the case I'm not sure what's rational behind it. What I do know is that this optimisation can be disabled by setting _optimizer_improve_selectivity to false:
SQL> SELECT 
  2  --+ OPT_PARAM('_optimizer_improve_selectivity' 'false') 
  3   t1.id1 
  4    FROM t1, 
  5         t2 
  6   WHERE t2.id2 = t1.id1 
  7     AND t1.fil1 = 1 
  8     AND t2.fil2 = 1 
  9   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
I'm not sure though what else is being switched off in this case. Also if you've got primary key's on your id columns, the optimisation is not kicking in:
SQL> alter table t1 add constraint pk_t1 primary key (id1);

Table altered.

SQL> alter table t2 add constraint pk_t2 primary key (id2);

Table altered.

SQL> SELECT t1.id1 
  2    FROM t1, 
  3         t2 
  4   WHERE t2.id2 = t1.id1 
  5     AND t1.fil1 = 1 
  6     AND t2.fil2 = 1 
  7   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
Obviously this feature can be (and in a case with one of my queries was) a problem when you've got a complex query with a (few) massive group by clause(s) leading to a significant drop in cardinality estimation, and as a result to picking out a suboptimal join method, especially when you are grouping by more then one column. The existence of filters is a mandatory condition for the optimisation to kick in.
Finally I want to mention a bug on MOS #12591379 which says the cardinality of a GROUP BY may drop severely if it's following HASH JOIN with OR condition. The bug is said to be fixed in version 12.1.

(1) I believe my assumption is oversimplified and the actual algorithm is more sophisticated, as at least if you use a value on t2.fil2 which is out of bound of its min-max values Oracle makes a different projection.

Wednesday, October 10, 2012

SQL Loader's Append

I've recently bumped into one peculiarity of SQLLoader, particularly its APPEND mode. SQLLoader's LOAD APPEND works in the same way as DML INSERT /*+ APPEND*/, that is makes Oracle to allocate new blocks above HWM, right? Well, not quite.
Here's a little demo on 10g R2 on Solaris:
SQL> create table t1
  2  (
  3    id  number(12),
  4    padding varchar2(255)
  5  );

Table created.

SQL> INSERT /*+ APPEND*/
  2  INTO t1
  3    SELECT rownum,
  4           rpad('x', 255)
  5      FROM dual
  6    CONNECT BY LEVEL <= 1000;

1000 rows created.

SQL> commit;

Commit complete.

SQL> delete from t1;

1000 rows deleted.

SQL> commit;

Commit complete.

SQL> set serveroutput on
SQL> DECLARE
  2    free_blks NUMBER;
  3  BEGIN
  4    dbms_space.free_blocks(USER, 'T1', 'TABLE', 0, free_blks);
  5    dbms_output.put_line(free_blks);
  6  END;
  7  /
19                                                                            

PL/SQL procedure successfully completed.
So far I've created a small table, populated it with a thousand rows, then deleted all the data and committed the transaction. Then I checked the number of free blocks in the freelist - it is 19.

Now let's load some data via SQL Loader using a control file as follows:
OPTIONS(readsize=20000000, bindsize=20000000, rows=8000)
LOAD DATA
INFILE *
APPEND INTO TABLE T1
FIELDS TERMINATED BY "|" optionally enclosed by '"' TRAILING NULLCOLS
(
ID,
PADDING
)
BEGINDATA
1|"x     <<254 whitespaces="whitespaces">>"
...
I've removed the data section to save space, but you've got the idea. Now let's run it:
SQL> host sqlldr userid=user/pwd@db control=ldr.ctl log=ldr.log

SQL> DECLARE
  2    free_blks NUMBER;
  3  BEGIN
  4    dbms_space.free_blocks(USER, 'T1', 'TABLE', 0, free_blks);
  5    dbms_output.put_line(free_blks);
  6  END;
  7  /
1                                                                              

PL/SQL procedure successfully completed.

SQL> BEGIN
  2    dbms_stats.gather_table_stats(ownname => USER, tabname => 'T1',
  3                                  cascade => TRUE,
  4                                  method_opt => 'for all columns size 1');
  5  END;
  6  /

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'T1';

    BLOCKS                                                                    
----------                                                                     
        19                                                                     
As you can see, we still have got the same 19 blocks in the table and only 1 of them is listed as free in the free list.

Saturday, October 6, 2012

XPath Query Rewrite

In this entry I want  to share my experience of tracing down an interesting problem arising from upgrading to Oracle 11g.
After a few days post migration to Oracle 11.2.0.3 of a database I happened to check the alert log and found there the following:

DDE: Problem Key 'ORA 4030' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes

ORA-4030 happens when a server process fails to allocate a portion of PGA memory.
I checked the application and the ETL jobs but nothing seemed to be failing. Therefore I decided to check the PGA memory used by different sessions:

SELECT *
  FROM v$process p,
       v$session s
 WHERE p.addr = s.paddr
 ORDER BY p.pga_alloc_mem DESC;


The first line I saw had had 4293618712 bytes in the PGA_ALLOC_MEM column. That process allocated 4Gb of PGA memory which is the maximum it could get out of Linux. Using the session information I isolated the code causing the issue. Turned out it was a massive (500+ lines) SQL statement heavily utilising XMLElement and XMLForest functions. 

A quick check on metalink revealed a bug #13477790 matched our situation (and later 
confirmed by the Oracle support to be the culprit). It said the problem was exactly due to usage of XML functions. What I couldn't understand from the description is how Oracle managed to get broken the code which was working perfectly on 10g.

So I decided to reproduce the problem. I isolated the code and when I ran it, I noticed that Oracle does not even start returning the rows from the query that was failing. Ergo I decided to trace CBO.

What I found in the 10053 trace file was lots of attempts to transform the query and then dies with ORA-4030.
In parallel I was googling what has changed in 11g in terms of XDB, and overall information about XDB (as I hasn't worked with it), and found a hint NO_XML_QUERY_REWRITE. I googled it and found that XDB has got a thing called XPath rewrite, when Oracle rewrites a query to optimise XML query execution. In my case no rewriting could be useful, as we don't use XDB per se - we just construct XML from plain tables and upload it as clob. So I tried it, and it helped. 

Eventually we applied a patch to fix the problem permanently.



Friday, September 14, 2012

ANSI SQL Bug

I’ve never been a fan of using ANSI SQL syntax in Oracle for different reasons. Here is one more to the list.
Have a look at the following query:
 SELECT dummy 
  FROM dual t1, 
       dual t2, 
       dual t3 
 WHERE t2.dummy = t1.dummy 
   AND t3.dummy = t1.dummy; 
If you run it you will get ORA-00918: column ambiguously defined, cause indeed DUMMY from which table Oracle is supposed to return? Let's rewrite the query using the ANSI SQL:
 SELECT dummy 
  FROM dual t1 
  JOIN dual t2 
    ON (t1.dummy = t2.dummy) 
  JOIN dual t3 
    ON (t1.dummy = t3.dummy);
If you run it on Oracle 10.2.0.4 (or older), you will see the following output:
 SQL> SELECT dummy 
  2    FROM dual t1 
  3    JOIN dual t2 
  4      ON (t1.dummy = t2.dummy) 
  5    JOIN dual t3 
  6      ON (t1.dummy = t3.dummy);

D 
- 
X

SQL>
Oracle does execute it without any errors. So which DUMMY has it actually taken then? Let's have a look at the Unparsed Query section of 10053 trace to find the answer (after some formatting):

SQL:******* UNPARSED QUERY IS ******* 
SELECT "T1"."DUMMY" "QCSJ_C000000000300000",
       "T2"."DUMMY" "QCSJ_C000000000300001" 
  FROM "SYS"."DUAL" "T1",
       "SYS"."DUAL" "T2" 
 WHERE "T1"."DUMMY"="T2"."DUMMY"

Query block (0x2aaaac888538) unchanged 
Query block (0x2aaaac887918) before join elimination: 
SQL:******* UNPARSED QUERY IS ******* 
SELECT "from$_subquery$_003"."QCSJ_C000000000300000_0" "QCSJ_C000000000300000",
       "from$_subquery$_003"."QCSJ_C000000000300001_1" "QCSJ_C000000000300001",
       "T3"."DUMMY" "DUMMY" 
  FROM  (SELECT "T1"."DUMMY" "QCSJ_C000000000300000_0",
                "T2"."DUMMY" "QCSJ_C000000000300001_1"
           FROM "SYS"."DUAL" "T1",
                "SYS"."DUAL" "T2" 
          WHERE "T1"."DUMMY"="T2"."DUMMY") "from$_subquery$_003",
        "SYS"."DUAL" "T3" 
 WHERE "from$_subquery$_003"."QCSJ_C000000000300000_0"="T3"."DUMMY"
Apparently Oracle joins first two tables and combins them into an inline view (first query block above). At this point it must take all the rows from both tables cause it doesn’t know which of them would be used later on for further join(s) or in the result set of columns. Ergo, in case some columns might be (and in my case are) named identically, Oracle gives them unique aliases.
Then it joins the third table (btw that’s why with two tabs the error doesn’t happen) (see the second query block in the snippet). Because this is the last table to be joined and all the previous tables have got unique aliases, Oracle doesn’t give unique aliases to t3 table’s columns. Finally it restricts the result set columns list as requested by user. At this point we've got only one dummy column, from table t3.
This behavior looks like a really nasty bug as it conceals errors in development. I bumped into it (like many others) when migrated to Oracle 11g where the bug is fixed and the ansi query like this starts failing. A quick search on metalink revealed a note ID 835701.1 saying that it's been fixed in 10.2.0.5 and afterwards.