Friday, September 14, 2012

ANSI SQL Bug

I’ve never been a fan of using ANSI SQL syntax in Oracle for different reasons. Here is one more to the list.
Have a look at the following query:
 SELECT dummy 
  FROM dual t1, 
       dual t2, 
       dual t3 
 WHERE t2.dummy = t1.dummy 
   AND t3.dummy = t1.dummy; 
If you run it you will get ORA-00918: column ambiguously defined, cause indeed DUMMY from which table Oracle is supposed to return? Let's rewrite the query using the ANSI SQL:
 SELECT dummy 
  FROM dual t1 
  JOIN dual t2 
    ON (t1.dummy = t2.dummy) 
  JOIN dual t3 
    ON (t1.dummy = t3.dummy);
If you run it on Oracle 10.2.0.4 (or older), you will see the following output:
 SQL> SELECT dummy 
  2    FROM dual t1 
  3    JOIN dual t2 
  4      ON (t1.dummy = t2.dummy) 
  5    JOIN dual t3 
  6      ON (t1.dummy = t3.dummy);

D 
- 
X

SQL>
Oracle does execute it without any errors. So which DUMMY has it actually taken then? Let's have a look at the Unparsed Query section of 10053 trace to find the answer (after some formatting):

SQL:******* UNPARSED QUERY IS ******* 
SELECT "T1"."DUMMY" "QCSJ_C000000000300000",
       "T2"."DUMMY" "QCSJ_C000000000300001" 
  FROM "SYS"."DUAL" "T1",
       "SYS"."DUAL" "T2" 
 WHERE "T1"."DUMMY"="T2"."DUMMY"

Query block (0x2aaaac888538) unchanged 
Query block (0x2aaaac887918) before join elimination: 
SQL:******* UNPARSED QUERY IS ******* 
SELECT "from$_subquery$_003"."QCSJ_C000000000300000_0" "QCSJ_C000000000300000",
       "from$_subquery$_003"."QCSJ_C000000000300001_1" "QCSJ_C000000000300001",
       "T3"."DUMMY" "DUMMY" 
  FROM  (SELECT "T1"."DUMMY" "QCSJ_C000000000300000_0",
                "T2"."DUMMY" "QCSJ_C000000000300001_1"
           FROM "SYS"."DUAL" "T1",
                "SYS"."DUAL" "T2" 
          WHERE "T1"."DUMMY"="T2"."DUMMY") "from$_subquery$_003",
        "SYS"."DUAL" "T3" 
 WHERE "from$_subquery$_003"."QCSJ_C000000000300000_0"="T3"."DUMMY"
Apparently Oracle joins first two tables and combins them into an inline view (first query block above). At this point it must take all the rows from both tables cause it doesn’t know which of them would be used later on for further join(s) or in the result set of columns. Ergo, in case some columns might be (and in my case are) named identically, Oracle gives them unique aliases.
Then it joins the third table (btw that’s why with two tabs the error doesn’t happen) (see the second query block in the snippet). Because this is the last table to be joined and all the previous tables have got unique aliases, Oracle doesn’t give unique aliases to t3 table’s columns. Finally it restricts the result set columns list as requested by user. At this point we've got only one dummy column, from table t3.
This behavior looks like a really nasty bug as it conceals errors in development. I bumped into it (like many others) when migrated to Oracle 11g where the bug is fixed and the ansi query like this starts failing. A quick search on metalink revealed a note ID 835701.1 saying that it's been fixed in 10.2.0.5 and afterwards.

3 comments:

  1. If it's all inner joins then why should it throw the ORA-00918? The projected values will be the same regardless of the underlying table chosen. There is no ambiguity.

    It would obviously be a problem if outer joins were involved where each table could end up projecting different values.

    Or am I missing something really basic here?

    ReplyDelete
    Replies
    1. Logically you are right, but as far as I know Oracle can't (yet) come to such a conclusion.
      Regardless of that, here's I hope a better example when even with inner join the problem is clear:

      SQL>
      create table t1
      (
      id number,
      vc varchar2(255)
      );

      Table created
      create table t2
      (
      id number,
      type number,
      val varchar2(255)
      );

      Table created
      insert into t1 values (1, rpad('x', 255));

      1 row inserted
      insert all
      into t2 values (1, 1, 'VALUE OF ONE TYPE')
      into t2 values (1, 2, 'VALUE OF ANOTHER TYPE')
      select null from dual;

      2 rows inserted
      commit;

      Commit complete
      SELECT id,
      val
      FROM t1
      JOIN t2
      ON t1.id = t2.id
      AND t2.type = 1
      JOIN t2 t3
      ON t3.id = t1.id
      AND t3.type = 2;

      ID VAL
      ---------- ------------------
      1 VALUE OF ANOTHER TYPE

      SQL>

      Delete
    2. That example's just plain wrong and certainly a bug! :-)

      But you've picked up on my underlying point which was that in situations where there is no ambiguity, where (bugs aside) ORA-00918 is currently thrown, there is no reason Oracle shouldn't be able to accept without explicit aliases.

      One could argue that being explicit is better (I'd tend to agree). It would most likely make the parser more complex to spot these cases for little benefit.

      Delete