oracle执行计划中的Semi-Join和Anti-Join(4)

oraunix

本帖最后由 oraunix 于 2010-11-11 16:52 编辑

Semi-Join Example #2

I recently worked on a project where applying the concepts presented in this section brought the execution time for a key query in my client’s application down from 46 minutes to 1.1 seconds. A screen in the application allowed a user to enter a project owner and up to five participants, and a list of assignments involving all of the specified people would be displayed. The users found that the response time was tolerable if they only selected one participant, but it went downhill from there as they added more participants. The original version of the query looked like this:

      SELECT DISTINCT A.name, A.code, A.description,
                        A.item_id, A.assignment_id, FI.string0, FI.string1
      FROM    relationships R, assignments A, format_items FI,
               relationships R1, relationships R2, relationships R3,
               relationships R4, relationships R5
      WHERE R.user_id = 134546
      AND    R.account_id = 134545
      AND    R.type_code = 0
      AND    A.item_id = R.item_id
      AND    FI.item_id = A.item_id
      AND    R1.item_id = A.item_id AND R1.status = 5 AND R1.user_id = 137279
      AND    R2.item_id = A.item_id AND R2.status = 5 AND R2.user_id = 134555
      AND    R3.item_id = A.item_id AND R3.status = 5 AND R3.user_id = 134546
      AND    R4.item_id = A.item_id AND R4.status = 5 AND R4.user_id = 137355
      AND    R5.item_id = A.item_id AND R5.status = 5 AND R5.user_id = 134556
      ORDER BY A.name ASC;
(I have changed the names of tables and columns, stripped away extraneous columns and predicates, and eliminated bind variables in order to both protect my client’s confidentiality and make the example easier to understand.) The execution plan looked like this:

      Rows    Row Source Operation
      -------  ---------------------------------------------------
         642  SORT UNIQUE (cr=23520269 r=34 w=0 time=2759937104 us)
      64339104 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=23520269 r=34 w=0 time=1838051782 us)
      95184881 NESTED LOOPS  (cr=7842642 r=23 w=0 time=907238095 us)
      2710288    NESTED LOOPS  (cr=2266544 r=23 w=0 time=103840003 us)
      317688    NESTED LOOPS  (cr=484734 r=11 w=0 time=23494451 us)
      50952    NESTED LOOPS  (cr=43280 r=10 w=0 time=2688237 us)
         4146       NESTED LOOPS  (cr=19016 r=3 w=0 time=988374 us)
         1831       NESTED LOOPS  (cr=13353 r=0 w=0 time=608296 us)
         4121       HASH JOIN  (cr=7395 r=0 w=0 time=399488 us)
         2046          TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=7211 r=0 w=0 time=299181 us)
      17788          INDEX RANGE SCAN RELATIONSHIPS_N3 (cr=71 r=0 w=0 time=81158 us)(object id 34528)
         3634          TABLE ACCESS FULL ASSIGNMENTS (cr=184 r=0 w=0 time=25536 us)
         1831       TABLE ACCESS BY INDEX ROWID FORMAT_ITEMS (cr=5958 r=0 w=0 time=163252 us)
         1831          INDEX RANGE SCAN FORMAT_ITEMS_N1 (cr=4127 r=0 w=0 time=115113 us)(object id 34316)
         4146       TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5663 r=3 w=0 time=349554 us)
         4264       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=3678 r=0 w=0 time=224390 us)(object id 34345)
      50952       TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=24264 r=7 w=0 time=1538051 us)
      70976       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=8428 r=0 w=0 time=630831 us)(object id 34345)
      317688    TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=441454 r=1 w=0 time=19584539 us)
      1631032       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=108302 r=0 w=0 time=7213989 us)(object id 34345)
      2710288    TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=1781810 r=12 w=0 time=72015925 us)
      4237104    INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=643298 r=1 w=0 time=29018764 us)(object id 34345)
      92474592    INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=5576098 r=0 w=0 time=503165776 us)(object id 34345)
It is no wonder that this query took so long when five participants were selected. With each join to the RELATIONSHIPS table, the Cartesian product gets more and more out of hand. By the final sort to eliminate the duplicates, we have over 64 million rows to reckon with.
上面的执行计划，无论是hash还是nested，都是一个糟糕的执行计划，join了太多的表，返回了太多的行，sort部分一定是处理了非常多的数据，势必效率会非常的差。如果发生mulipass的排序，那么效果将会非常的差。
返回的数据非常多，被过滤的数据也非常多，那么更可以考虑是否可以使用半连接。

This query struck me right away as a good candidate for a nested loops semi-join. We want a list of assignments that meet certain criteria. Whether a specific user has one qualifying relationship record or 10, it makes no difference. We are only concerned with the existence of at least one qualifying relationship record for each of the specified users.

Rewriting the query with EXISTS clauses to semi-join to the RELATIONSHIPS table five times is pretty easy. The challenge here is that the query does have a DISTINCT operator, and this defeats Oracle’s semi-join access paths. Not to worry—an inline view with a NO_MERGE hint saved the day. The revised query looked like:
//因为半连接返回的数据行，不能保证distinct，这个地方还是需要加上distinct，这和上面的例子不同。加上了distinct，就限制了半连接的使用，因此我们使用了hints，使得主查询和子查询不进行合并。
      SELECT /*+ NO_MERGE (M) */
               DISTINCT M.name, M.code, M.description,
                        M.item_id, M.assignment_id, M.string0, M.string1
      FROM    (
               SELECT A.name, A.code, A.description,
                     A.item_id, A.assignment_id, FI.string0, FI.string1
               FROM relationships R, assignments A, format_items FI
               WHERE  R.user_id = 134546
               AND R.account_id = 134545
               AND R.type_code = 0
               AND A.item_id = R.item_id
               AND FI.item_id = A.item_id
               AND EXISTS
                     (SELECT 1 FROM relationships R1
                     WHERE R1.item_id = A.item_id AND R1.status = 5
                     AND R1.user_id = 137279)    //连接一个表，使用exists。
               AND EXISTS
                     (SELECT 1 FROM relationships R2
                     WHERE R2.item_id = A.item_id AND R2.status = 5
                     AND R2.user_id = 134555)
               AND EXISTS
                     (SELECT 1 FROM relationships R3
                     WHERE R3.item_id = A.item_id AND R3.status = 5
                     AND R3.user_id = 134546)
               AND EXISTS
                     (SELECT 1 FROM relationships R4
                     WHERE R4.item_id = A.item_id AND R4.status = 5
                     AND R4.user_id = 137355)
               AND EXISTS
                     (SELECT 1 FROM relationships R5
                     WHERE R5.item_id = A.item_id AND R5.status = 5
                     AND R5.user_id = 134556)
               ) M
      ORDER BY M.name ASC;
The execution plan was as follows:

      Rows    Row Source Operation
      -------  ---------------------------------------------------
         642  SORT UNIQUE (cr=36315 r=89 w=0 time=1107054 us)
         1300 VIEW  (cr=36315 r=89 w=0 time=1085116 us)
         1300 NESTED LOOPS SEMI (cr=36315 r=89 w=0 time=1082232 us)
         1314    NESTED LOOPS SEMI (cr=32385 r=89 w=0 time=1002330 us)
         1314    NESTED LOOPS SEMI (cr=28261 r=89 w=0 time=904654 us)
         1314    NESTED LOOPS SEMI (cr=22822 r=89 w=0 time=737705 us)
         1322       NESTED LOOPS SEMI (cr=18730 r=89 w=0 time=651196 us)
         1831       NESTED LOOPS  (cr=13353 r=89 w=0 time=530670 us)
         4121       HASH JOIN  (cr=7395 r=89 w=0 time=347584 us)
         2046          TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=7211 r=0 w=0 time=186820 us)
      17788          INDEX RANGE SCAN RELATIONSHIPS_N3 (cr=71 r=0 w=0 time=43770 us)(object id 34528)
         3634          TABLE ACCESS FULL ASSIGNMENTS (cr=184 r=89 w=0 time=91899 us)
         1831       TABLE ACCESS BY INDEX ROWID FORMAT_ITEMS (cr=5958 r=0 w=0 time=141416 us)
         1831          INDEX RANGE SCAN FORMAT_ITEMS_N1 (cr=4127 r=0 w=0 time=100207 us)(object id 34316)
         1322       TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5377 r=0 w=0 time=92046 us)
         2472       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=3664 r=0 w=0 time=61077 us)(object id 34345)
         1314       TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=4092 r=0 w=0 time=63478 us)
         1582       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2647 r=0 w=0 time=40433 us)(object id 34345)
         1314    TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5439 r=0 w=0 time=133620 us)
      11011       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2639 r=0 w=0 time=65312 us)(object id 34345)
         1314    TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=4124 r=0 w=0 time=75657 us)
         2234    INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2630 r=0 w=0 time=49373 us)(object id 34345)
         1300    TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=3930 r=0 w=0 time=58651 us)
         1300    INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2630 r=0 w=0 time=38832 us)(object id 34345)
The execution plan starts out the same as before, with a hash join between RELATIONSHIPS and ASSIGNMENTS followed by a nested loops join to FORMAT_ITEMS. At this point there are 1831 rows in the candidate result set. Whereas the original query does five conventional joins to the RELATIONSHIPS table and the 1831 rows balloon to over 64 million, the revised query does semi-joins instead and whittles the candidate rows down to 1300. Joining 1300 rows is a lot faster than joining 64 million rows, so you can see how the semi-join access path really speeds up this query.

The NO_MERGE hint and the inline view (aliased as “M”) causes Oracle to separate the DISTINCT operation from the bulk of the query. This is how we get around the fact that a DISTINCT operation defeats semi-join access paths. In the second version of the query, Oracle executes the query as if the DISTINCT keyword were not there. Then, as a final step, it applies the DISTINCT operation to the interim result set that it has stored in memory (the output of the VIEW operator).

If you are wondering why the NO_MERGE hint is necessary, recall what we discussed earlier about the heuristic Oracle uses when deciding whether or not to merge a subquery into the main body of a query. Oracle assumes that merging is a good thing and will do it whenever possible. Thus without the NO_MERGE hint Oracle would merge the inline view into the DISTINCT operation (eliminating the inline view), and the semi-join access path would be defeated by the presence of the DISTINCT operation.

This example demonstrates the value of the semi-join access paths. Your gains may not be as spectacular as we have seen here, but hopefully you can see from this discussion how proper use of semi-joins can make a certain class of queries much more efficient. This example also underscores the importance of knowing how to properly use semi-joins. If you did not remember that semi-join access paths are defeated by the DISTINCT operation, you might have changed the example query to use EXISTS clauses and wondered why the query still ran slowly.

P3 · 发表于 2011-2-27 22:31:51

nice job ~

图文精华

oracle执行计划中的Semi-Join和Anti-Join(4)