Wednesday, December 12, 2012

11g Group By Selectivity Improvements

Oracle 11g (at least 11.2.0.3 where I'm testing it) has brought us an interesting improvement to selectivity of  group by statement. Here's a little demo:
SQL> CREATE TABLE t1 AS 
  2    SELECT LEVEL AS id1, 
  3           MOD(LEVEL, 20) fil1, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

SQL> CREATE TABLE t2 AS 
  2    SELECT LEVEL AS id2, 
  3           MOD(LEVEL, 5) fil2, 
  4           rpad('x', 1000) padding 
  5      FROM dual 
  6    CONNECT BY LEVEL < 10000 
  7  ;

Table created.

REM <<< Gather statistics without histograms >>>
At this point we've got 2 tables with 10k rows in each. The columns of interest are fil1 and fil2 with selectivity 1/20 and 1/5 respectively. Now let's run a query as follows:
SQL> set autotrace traceonly 
SQL> SELECT 
  2   t1.id1 
  3    FROM t1, 
  4         t2 
  5   WHERE t2.id2 = t1.id1 
  6     AND t1.fil1 = 1 
  7     AND t2.fil2 = 1 
  8   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   125 |  1750 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
As you can see above Oracle has estimated the HASH GROUP BY as returning 125 rows even though in reality it is 500 (which you would get say in 10g).
What I think is happening in this particular case(1) (and I empirically proved it by playing around with selectivities of fil1 and fil2 columns) is that Oracle applies selectivities of fil1 and fil2 columns to the result of HASH JOIN (id=2) in the following way:
GROUP BY CARD = JOIN CARD * SEL(t1.fil1) / SEL(t2.fil2) 
and if we put numbers in place it would be:
GROUP BY CARD = 500 * 1/20 / (1/5) = 500 / 20 * 5 = 125 
If that's the case I'm not sure what's rational behind it. What I do know is that this optimisation can be disabled by setting _optimizer_improve_selectivity to false:
SQL> SELECT 
  2  --+ OPT_PARAM('_optimizer_improve_selectivity' 'false') 
  3   t1.id1 
  4    FROM t1, 
  5         t2 
  6   WHERE t2.id2 = t1.id1 
  7     AND t1.fil1 = 1 
  8     AND t2.fil2 = 1 
  9   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
I'm not sure though what else is being switched off in this case. Also if you've got primary key's on your id columns, the optimisation is not kicking in:
SQL> alter table t1 add constraint pk_t1 primary key (id1);

Table altered.

SQL> alter table t2 add constraint pk_t2 primary key (id2);

Table altered.

SQL> SELECT t1.id1 
  2    FROM t1, 
  3         t2 
  4   WHERE t2.id2 = t1.id1 
  5     AND t1.fil1 = 1 
  6     AND t2.fil2 = 1 
  7   GROUP BY t1.id1;

500 rows selected.


Execution Plan 
---------------------------------------------------------- 
Plan hash value: 3226881135

---------------------------------------------------------------------------- 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
---------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT    |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|   1 |  HASH GROUP BY      |      |   500 |  7000 |   478   (1)| 00:00:07 | 
|*  2 |   HASH JOIN         |      |   500 |  7000 |   477   (1)| 00:00:07 | 
|*  3 |    TABLE ACCESS FULL| T1   |   500 |  3500 |   238   (0)| 00:00:04 | 
|*  4 |    TABLE ACCESS FULL| T2   |  2000 | 14000 |   238   (0)| 00:00:04 | 
----------------------------------------------------------------------------
Obviously this feature can be (and in a case with one of my queries was) a problem when you've got a complex query with a (few) massive group by clause(s) leading to a significant drop in cardinality estimation, and as a result to picking out a suboptimal join method, especially when you are grouping by more then one column. The existence of filters is a mandatory condition for the optimisation to kick in.
Finally I want to mention a bug on MOS #12591379 which says the cardinality of a GROUP BY may drop severely if it's following HASH JOIN with OR condition. The bug is said to be fixed in version 12.1.

(1) I believe my assumption is oversimplified and the actual algorithm is more sophisticated, as at least if you use a value on t2.fil2 which is out of bound of its min-max values Oracle makes a different projection.