oraunix 发表于 2010-11-11 18:06:06

oracle执行计划中的Semi-Join和Anti-Join(5)

本帖最后由 oraunix 于 2010-11-11 19:45 编辑

The NOT EXISTS and NOT IN Constructs

In this section we will discuss how Oracle evaluates NOT EXISTS and NOT IN clauses, prerequisites for using Oracle’s anti-join access paths, and hints that influence anti-join query optimization. Then we’ll look at a few examples of how anti-join access paths can be used to make some queries more efficient.

How Oracle Evaluates NOT EXISTS and NOT IN Clauses

NOT EXISTS and NOT IN clauses are used to retrieve rows from a table for which no match is found in another table. In the early days, Oracle would use the subquery of a NOT EXISTS or NOT IN clause as a filter. For each candidate row retrieved by the main body of the query, Oracle would execute the subquery. If the subquery retrieved one or more rows, then the candidate row would be discarded. Conceptually this approach seems similar to a nested loops anti-join.

Somewhere around Oracle 7.3, the merge anti-join and hash anti-join access paths were added. These enabled Oracle to leverage the merge and hash join algorithms when performing anti-joins. However, Oracle would only use these algorithms if the always_anti_join instance parameter was set appropriately or if a hint was added to the query. In Oracle 8i this still appears to be true. Oracle 9i, however, does away with the always_anti_join instance parameter and instead the optimizer evaluates the costs of the different algorithms and chooses the one with the lowest cost.
//not in和not exists也是根据第一个表的数据,在第二个表中进行匹配,只不过是一旦有一个以上的匹配,第一个表的数据就不会显示。

In an earlier section of this paper we discussed when the EXISTS clause should be used versus when the IN clause should be used. The two are functionally equivalent, and actively choosing one over the other is usually done in the hopes that this will help the optimizer choose a better execution plan. When talking about NOT EXISTS and NOT IN, however, it is important to note that these two constructs are not functionally equivalent—there is a significant difference between the two in how null values are handled.
不同于in和exists(两者相同),not in和not exists在处理null值上不同。
If the subquery of a NOT IN clause returns at least one null value, then the NOT IN predicate evaluates to false. The NOT EXISTS construct, meanwhile, concerns itself only with whether a row is returned or not, and thus does not do anything differently if a null value is returned. This may seem a bit abstract, so let’s clarify the point with an example.
如果not in返回至少一个null,那么not in就返回flase,第一个表的数据行就不会显示。
对于not exists,只是关心是否返回行,因此返回的无论是null还是非null,都是true,只要有数据返回。

Suppose you want a list of empty departments—departments that have no employees. You could write this query with a NOT EXISTS clause or a NOT IN clause. We saw the NOT EXISTS case earlier:
显示没有员工的部门信息。

      SELECT   D.deptno, D.dname
      FROM   dept D
      WHERE    NOT EXISTS
               (
               SELECT 1
               FROM   emp E
               WHEREE.deptno = D.deptno
               )
      ORDER BY D.deptno;
Written with a NOT IN clause, the query becomes:

      SELECT   D.deptno, D.dname
      FROM   dept D
      WHERE    D.deptno NOT IN
               (
               SELECT E.deptno
               FROM   emp E
               )
      ORDER BY D.deptno;
这是一个无关子查询,如果子查询中有null返回,那么这个查询的返回总是空,不会返回任何数据。
但是not exists可以正确返回没有职员的部门。
因此在使用not in的时候一定要注意。
The queries look like they would be functionally equivalent, but if you are using the EMP and DEPT tables from the SCOTT schema of an Oracle 9i database, then the two queries are in fact different. Insert a row into the EMP table with a null DEPTNO, run the two versions of the query, and you will see the difference. The NOT EXISTS version will show you that department 40 has no employees. However, the NOT IN version of the query will say there are no empty departments. Why? Because the subquery of the NOT IN clause retrieves a row with a null value, causing the NOT IN clause to always evaluate to false. As strange as this NOT IN nuance may seem, it is fully documented in Metalink document 28934.1.


The difference between NOT EXISTS and NOT IN with respect to handling of nulls is an important one—not just from the standpoint of differing functionality, but also from a performance perspective as well. To evaluate a NOT IN subquery that is capable of returning nulls, Oracle cannot scan an index because an implicit NVL() function must be applied and this would defeat the index. (Null values aren’t stored in single-column non-unique indexes, anyway.) We’ll see the impact of this implicit NVL() function in the examples later in this section.
使用not in还是not exists,不仅仅是功能上不同,而且在性能上也有很大的不同。
NOT IN会骗过子查询中所涉及表的索引。

If the subquery of a NOT IN clause is physically incapable of returning a null value (because the columns involved have NOT NULL constraints or because predicates have been added to the WHERE clause requiring non-null values), then the query with the NOT IN clause will return the same results as a query with an equivalent NOT EXISTS clause—and indexes won’t be defeated in the process.
//如果子查询中的列上有not null约束,或者查询中有where子句(去除了null),那么not in可以走索引,效果和not exists效果一样。这一点很重要。
If we wanted to write the query that shows empty departments with a NOT IN clause that gave the same results as the NOT EXISTS version, we would need to add a predicate to the subquery that rules out null values. The resulting query is:

      SELECT   D.deptno, D.dname
      FROM   dept D
      WHERE    D.deptno NOT IN
               (
               SELECT E.deptno
               FROM   emp E
               WHEREE.deptno IS NOT NULL
               )
      ORDER BY D.deptno;
To choose whether to use NOT EXISTS or NOT IN in a query, therefore, you must first consider how you want null values to be handled. If you need the presence of a null value in the subquery to cause the expression to always evaluate to false, then you need to use the NOT IN construct. Otherwise you have a choice between NOT EXISTS and NOT IN (with the extra WHERE predicate if the subquery would otherwise be capable of returning a null value).
确定是使用not exists还是使用not in?关键是看你要怎样处理null值。
如果对于null值,你总是希望返回false,那么应该使用not in。
否则,就可以使用not exists,或者使用not in(需要使用where去除null;或者加上not null约束)。

If you don’t need the special null value semantics of NOT IN, then your choice of whether to use NOT EXISTS or NOT IN hinges on what sort of access path you think will be best for the query. In a conventional join, the nested loops algorithm is desirable when the predicates on the first table are very selective and the join columns in the second table are selectively indexed. The merge and hash join algorithms, meanwhile, are more desirable when predicates are not very selective or the join columns in the second table are not selectively indexed. A similar rule of thumb applies to anti-joins. That is to say, merge and hash anti-joins are most valuable in situations where the first table has a lot of candidate rows or the join columns are not selectively indexed.如果你不需要关注null值的处理。那么你就应该关注是nested还是hash,或者sort merge。
对于第一个表的谓词列,选择性很高,子查询的表的连接列上有选择性很高的索引的时候,系统会优先选择nested loop。
第一个表的谓词选择性很低,第二个表中的连接列的选择性很低,那么系统只能选择hash或者merge。

If you think your query is going to be better off with a merge or hash anti-join than a nested loops anti-join or a filter operation, then you might want to code your query with a NOT IN clause instead of a NOT EXISTS clause. Whenever the Oracle documentation discusses anti-join access paths, it always seems to do so in the context of the NOT IN construct with no mention of NOT EXISTS. While sometimes Oracle 9i will choose an anti-join access path for a query with a NOT EXISTS clause, it does seem that Oracle 9i might be more likely to do so if NOT IN is used instead of NOT EXISTS. In Oracle 8i, I believe the query optimizer will never choose an anti-join access path for a query with a NOT EXISTS clause (even if always_anti_join is set and a hint appears in the subquery).
如果我们不想使用filter、nested loops anti-join,那么我们可以多考虑使用not in,而不是not exists。但是有时候,oracle可能会更加喜欢not exists。
总之是出了问题要看执行计划。

Meanwhile, if you don’t think your query will benefit from a merge or hash anti-join, then you might be best off with the good old filter technique instead of the nested loops anti-join access path. It appears that, at least in Oracle 9i, the nested loops anti-join might not be as efficient as it could be. In the example of the query for empty departments that we have been discussing, the filter technique will rule out a department as soon as the first employee in that department is located. The nested loops anti-join, meanwhile, appears to locate all employees in the department before ruling it out. This is unnecessary work. Whether a department has one employee or 100, it does not qualify as an empty department. So Oracle should cut to the chase and move on to the next department as soon as the first employee is found. The filter technique does just that, but the nested loops anti-join (at least in the cases I have seen) does not. The second example in this section will demonstrate this.作者在这个地方描述了关于filter。
通常我们不想使用filter,但是有可能filer的效果更好,因为filter在处理子查询的时候,当碰到第一个匹配的结果的时候,就结束这一次的子查询,进行下一个。因此效果可能也挺好,具体还是要看执行统计信息。
如果执行计划不是很准,就进行trace,然后tkprof,这一个很好的习惯,特别是在具体分析sql语句的时候。
我们在c/s的时候经常使用,在b/s中,照样可用。

The upshot is that you must first consider how null values should be handled when deciding whether to use NOT EXISTS or NOT IN. If you need the special semantics provided by NOT IN, then your decision has been made. Otherwise, you should next consider whether or not the query might benefit from a merge or hash anti-join. If so, then you probably ought to choose NOT IN. If you decide to go with NOT IN but do not want the expression to evaluate to false if a null value is found, then make sure the subquery cannot return a null value. If there is no chance that the query will benefit from a merge or hash anti-join and the special semantics of NOT IN are not desired, then you might want to select the NOT EXISTS construct so that there is a better chance Oracle will perform an efficient filter instead of an inefficient nested loops anti-join.

1、必须首先确认业务上需要如何处理null值,然后确定使用not in还是not exists。
2、下一步就是这个子查询是否受益于merge、hash join,那么你可能需要使用not in。
3、如果你希望使用not in,但是不希望因为null的问题导致结果是false,那么你要确认返回的子查询结果没有null值。
4、如果感觉不能受益于merge、hash join,而且也不需要not in的特殊语义,那么可以选择使用not exists。
这个时候,oracle可能会选择使用效率更高的filter,而不是nested loops anti-join。


blueseine 发表于 2011-2-28 10:41:10

nice job ~

blueseine 发表于 2011-2-28 10:43:24

在10.2.0.1和9.2.0.7中 上述的not in 和not exists没有出现不一致的问题,要是老师可以给点版本的例子就更好了。譬如不走index的时候。
感谢您对这两种join的阐述。
页: [1]
查看完整版本: oracle执行计划中的Semi-Join和Anti-Join(5)