关于exists、in、not in、not exists的一些讨论(不错)
本帖最后由 oraunix 于 2010-11-16 09:02 编辑大家借鉴一下,我们经常的需要关注这方面的执行计划。
使用exists、not eixsts不能用到semi-join、anti-join的一些情况讨论
在Oralce中我们经常会使用exists、not exists来进行一些表连接操作,在普通情况下Oracle的
执行计划如果使用到semi-join、anti-join方式来进行表连接的话,速度一般很快,否则可能很慢(这是通常情况,有很多的例外,具体情况还要看执行计划和开销)。通过使用提示hash_sj hash_aj也可使执行计划走semi-join、anti-join。
但是在有些情况下,即使增加了提示,或者对表进行了分析后,执行计划也不走semi-join、anti-join
方式,下面就是对这些情况的讨论。
附注:本文中的例子均是在Oracle9.2.0.6 for linux中测试的。
1、在where条件中多个exists、not exists是or的关系
SQL> create table a1 as select * from dba_objects where rownum <= 10000;
SQL> create table a2 as
select * from (select t.*,rownum rd from dba_objects t) p where rd>5000 and rd<= 15000;
SQL> create table a3 as
select * from (select t.*,rownum rd from dba_objects t) p where rd>7500 and rd<= 17500;
SQL> analyze table a1 compute statistics;
SQL> analyze table a2 compute statistics;
SQL> analyze table a3 compute statistics;
SQL> set autotrace on exp
SQL> set timing on
----- exists or exists 情况
SQL> select a1.owner,count(*)
2 from a1
3where exists
4 (select /*+ hash_sj(a1 a2) */* from a2
5 where a2.object_id=a1.object_id)
6 or
7 exists
8 (select /*+ hash_sj(a1 a2) */* from a3
9 where a3.object_id=a1.object_id)
10group by a1.owner;
OWNER COUNT(*)
------------------------------ ----------
SYS 5000
已用时间: 00: 00: 26.66
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=7)
1 0 SORT (GROUP BY) (Cost=18 Card=1 Bytes=7)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=975 Bytes=6825)
4 2 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4)
5 2 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=1 Bytes=4)
执行计划中显然没有用到semi-jobin,这时候只有改写SQL,通过union all的方式来达到使用到
semi-join的目的。
SQL> select x.owner,count(*)
2from (select * from a1
3 where exists
4 (select /*+ hash_sj(a1 a2)*/* from a2
5 where a2.object_id= a1.object_id )
6 union all
7 select * from a1
8 where not exists
9 (select /*+ hash_aj(a1 p)*/* from a2 p
10 where p.object_id=a1.object_id)
11 and exists
12 (select /*+ hash_sj(a1 a3)*/* from a3
13 where a3.object_id=a1.object_id)
14 ) x
15group by x.owner;
OWNER COUNT(*)
------------------------------ ----------
SYS 5000
已用时间:00: 00: 00.31
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=90 Card=10001 Bytes=
170017)
1 0 SORT (GROUP BY) (Cost=90 Card=10001 Bytes=170017)
2 1 VIEW (Cost=88 Card=10001 Bytes=170017)
3 2 UNION-ALL
4 3 HASH JOIN (SEMI) (Cost=35 Card=10000 Bytes=200000)
5 4 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000)
6 4 VIEW OF 'VW_SQ_2' (Cost=17 Card=10000 Bytes=130000)
7 6 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000Bytes=40000)
8 3 HASH JOIN (ANTI) (Cost=53 Card=1 Bytes=24)
9 8 HASH JOIN (SEMI) (Cost=35 Card=10000 Bytes=200000)
10 9 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000)
11 9 VIEW OF 'VW_SQ_1' (Cost=17 Card=10000 Bytes=130000)
12 11 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=10000 Bytes=40000)
13 8 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000)
----- not exists or not exists 情况
SQL> select a1.owner,count(*)
2 from a1
3where not exists
4 (select /*+ hash_sj(a1 a2) */* from a2
5 where a2.object_id=a1.object_id)
6 or
7 not exists
8 (select /*+ hash_sj(a1 a2) */* from a3
9 where a3.object_id=a1.object_id)
10group by a1.owner;
OWNER COUNT(*)
------------------------------ ----------
SYS 7500
已用时间:00: 00: 20.72
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=7)
1 0 SORT (GROUP BY) (Cost=18 Card=1 Bytes=7)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=975 Bytes=6825)
4 2 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4)
5 2 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=1 Bytes=4)
SQL> select x.owner,count(*)
2from (select * from a1
3 where not exists
4 (select /*+ hash_sj(a1 a2)*/* from a2
5 where a2.object_id= a1.object_id )
6 union all
7 select * from a1
8 where exists
9 (select /*+ hash_aj(a1 p)*/* from a2 p
10 where p.object_id=a1.object_id)
11 and not exists
12 (select /*+ hash_sj(a1 a3)*/* from a3
13 where a3.object_id=a1.object_id)
14 ) x
15group by x.owner;
OWNER COUNT(*)
------------------------------ ----------
SYS 7500
已用时间:00: 00: 00.31
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=89 Card=2 Bytes=34)
1 0 SORT (GROUP BY) (Cost=89 Card=2 Bytes=34)
2 1 VIEW (Cost=88 Card=2 Bytes=34)
3 2 UNION-ALL
4 3 HASH JOIN (ANTI) (Cost=35 Card=1 Bytes=11)
5 4 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000)
6 4 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000)
7 3 HASH JOIN (SEMI) (Cost=53 Card=1 Bytes=24)
8 7 HASH JOIN (ANTI) (Cost=35 Card=1 Bytes=11)
9 8 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000)
10 8 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=10000 Bytes=40000)
11 7 VIEW OF 'VW_SQ_1' (Cost=17 Card=10000 Bytes=130000)
12 11 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000)
----- not exists or exists 情况
SQL> select a1.owner,count(*)
2 from a1
3wherenot exists
4 (select /*+ hash_sj(a1 a2) */* from a2
5 where a2.object_id=a1.object_id)
6 orexists
7 (select /*+ hash_sj(a1 a3) */* from a3
8 where a3.object_id=a1.object_id)
9group by a1.owner;
OWNER COUNT(*)
------------------------------ ----------
SYS 7500
已用时间:00: 00: 20.57
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=7)
1 0 SORT (GROUP BY) (Cost=18 Card=1 Bytes=7)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=975 Bytes=6825)
4 2 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4)
5 2 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=1 Bytes=4)
SQL> select x.owner,count(*)
2from (select * from a1
3 where not exists
4 (select /*+ hash_sj(a1 a2)*/* from a2
5 where a2.object_id= a1.object_id )
6 union all
7 select * from a1
8 where exists
9 (select /*+ hash_aj(a1 p)*/* from a2 p
10 where p.object_id=a1.object_id)
11 and exists
12 (select /*+ hash_sj(a1 a3)*/* from a3
13 where a3.object_id=a1.object_id)
14 ) x
15group by x.owner;
OWNER COUNT(*)
------------------------------ ----------
SYS 7500
已用时间:00: 00: 00.32
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=90 Card=10001 Bytes=
170017)
1 0 SORT (GROUP BY) (Cost=90 Card=10001 Bytes=170017)
2 1 VIEW (Cost=88 Card=10001 Bytes=170017)
3 2 UNION-ALL
4 3 HASH JOIN (ANTI) (Cost=35 Card=1 Bytes=11)
5 4 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000)
6 4 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000)
7 3 HASH JOIN (SEMI) (Cost=53 Card=10000 Bytes=330000)
8 7 HASH JOIN (SEMI) (Cost=35 Card=10000 Bytes=200000)
9 8 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000)
10 8 VIEW OF 'VW_SQ_2' (Cost=17 Card=10000 Bytes=130000)
11 10 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=10000 Bytes=40000)
12 7 VIEW OF 'VW_SQ_1' (Cost=17 Card=10000 Bytes=130000)
13 12 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000)
从上面的例子可以看出速度的提高是非常明显的,当然对于or的情况越多,修改后的SQL也就越复杂,不过相对于效率
的提高来说还是很值得的。
2、如果子句中含有group by这种排序操作,使用in来代替exists操作会大大提高效率,示例如下:
修改一下示例表:
SQL> insert into a2 select * from a2;
SQL> commit;
<1> 使用exists操作,无论添加什么提示都不能使其走semi-join,耗费大量时间
SQL> select count(*) from a1 a
2 where exists
3 ( select /*+ hash_sj(a b) */*
4 from ( select object_id from a2
5 group by object_id having count(*) > 1
6 ) b
7 where a.object_id=b.object_id
8 );
COUNT(*)
----------
5000
已用时间:00: 00: 46.22
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=500 Bytes=2000)
4 2 FILTER
5 4 SORT (GROUP BY NOSORT) (Cost=17 Card=1 Bytes=4)
6 5 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4)
<2> 将exists改为in句型,就可以用到semi-join,效率大大提高
SQL> select count(*) from a1 a where object_id in
2( select object_idfrom a2 b
3 group byobject_id having count(*) > 1
4);
COUNT(*)
----------
5000
已用时间:00: 00: 00.90
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=1 Bytes=17)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (SEMI) (Cost=37 Card=500 Bytes=8500)
3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=40000)
4 2 VIEW OF 'VW_NSO_1' (Cost=19 Card=500 Bytes=6500)
5 4 FILTER
6 5 SORT (GROUP BY) (Cost=19 Card=500 Bytes=2000)
7 6 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000)
对于not exists情况也是同样的,需要将语句改写为not in,但是要加上提示以及关联的列要有非空的限制
才能使其走anti-join的执行计划。
SQL> select count(*) from a1 a
2 where not exists
3 ( select /*+ hash_aj(a b)*/* from
4 ( select object_id from a2
5 group by object_id having count(*) > 1
6 ) b
7 where a.object_id=b.object_id
8 );
COUNT(*)
----------
5000
已用时间:00: 00: 46.47
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=500 Bytes=2000)
4 2 FILTER
5 4 SORT (GROUP BY NOSORT) (Cost=17 Card=1 Bytes=4)
6 5 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4)
SQL> select count(*) from a1 a
2 where object_id is not null
3 and object_id not in
4 ( select object_idfrom a2 b
5 where object_id is not null
6 group byobject_id having count(*) > 1
7 );
COUNT(*)
----------
5000
已用时间:00: 00: 00.75
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=1 Bytes=17)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (ANTI) (Cost=37 Card=9500 Bytes=161500)
3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=40000)
4 2 VIEW OF 'VW_NSO_1' (Cost=19 Card=500 Bytes=6500)
5 4 FILTER
6 5 SORT (GROUP BY) (Cost=19 Card=500 Bytes=2000)
7 6 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000)
页:
[1]