Wednesday, June 12, 2013

SQL Profile for INSERT-ALL

We’ve had a performance issue after our last release with one insert statement – the plan was not the one we were expecting – which we decided to address by creating a SQL profile from the same statement’s “good” plan we’ve got in UAT environment. We’ve hit a problem though – the profile wasn’t kicking in, and our DBA suggested that the reason for that was that the SQL we used as the source of the outline information was missing the actual outline information. Oracle keeps the outline information in V$SQL_PLAN.OTHER_XML in a outline_data section. I decided to check out whether there were any other SQLs which missing outline_data section:
SQL> select count(*)
  2  from   (select p.*
  3                ,value(d) info
  4          from   dba_hist_sql_plan p
  5                ,table(xmlsequence(extract(xmltype(p.other_xml),
  6                                           '/other_xml/outline_data')))(+) d
  7          where  other_xml is not null)
  8  where  info is null;

  COUNT(*)
----------
        82

SQL>
So there were a few of those. What caught my eye though when I checked the actual query plans was the operations which were missing the outlines:
SQL> select unique operation
  2  from   (select p.*
  3                ,value(d) info
  4          from   dba_hist_sql_plan p
  5                ,table(xmlsequence(extract(xmltype(p.other_xml),
  6                                           '/other_xml/outline_data')))(+) d
  7          where  other_xml is not null)
  8  where  info is null;

OPERATION
------------------------------
LOAD TABLE CONVENTIONAL
INDEX BUILD
MULTI-TABLE INSERT
PX COORDINATOR

SQL>
Notice multi-table insert (which was exactly our case) – for some reason Oracle is not creating (or populating) the outline section when you are doing multi-table insert, at least not on 11.2.0.3 I’m using. I’ve also checked whether there are any multi-table inserts which have got outline_data and as expected have found none.

Thursday, April 4, 2013

Why my index is being ignored post hardware upgrade?

Here's a nice case study how a better hardware can make the existing (poor) code perform worse off. Consider the following query:
select /*+ parallel*/ * from t1 where hname = 'RARE VALUE' and id = 2;
The query plan on a slow machine looks like this:
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     2 |    44 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     2 |    44 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX |     2 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
and on the fast like this:
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     2 |    44 |     4   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |     2 |    44 |     4   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |     2 |    44 |     4   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       |     2 |    44 |     4   (0)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------
The difference that matters between the two machines is the number of CPUs. On the old one it is 16 and on the new one 80 (with HT enabled). Therefore the default degree of parallelism on the new machine is 2 (parallel_threads_per_cpu) * 80 (cpu_count) = 160, which reduces the cost of full scan of the table to a value small enough to "beat" an index access, in terms of cost of course. In reality it wasn't as fast as an index access in my case, alas.
In case you want to reproduce, here's a script to create the T1 table:
create table t1 as
   select ceil(dbms_random.value(1, 100000)) as id
         ,case 
             when mod(rownum, 1000) = 0 then cast('RARE VALUE' as varchar2(100))
             when mod(rownum, 3) = 0    then cast('COMMON VALUE 1' as varchar2(100))
             when mod(rownum, 3) = 1    then cast('COMMON VALUE 2' as varchar2(100))
             when mod(rownum, 3) = 2    then cast('COMMON VALUE 3' as varchar2(100))
          end hname
         ,'x' value
   from dual connect by level <= 500000;

<>

create index t1_idx on t1 (id, hname) pctfree 50;
The plans above were created on Oracle 11.2.0.3 on Linux x64