本帖最后由 oraunix 于 2010-11-11 16:52 编辑
Semi-Join Example #2
I recently worked on a project where applying the concepts presented in this section brought the execution time for a key query in my client’s application down from 46 minutes to 1.1 seconds. A screen in the application allowed a user to enter a project owner and up to five participants, and a list of assignments involving all of the specified people would be displayed. The users found that the response time was tolerable if they only selected one participant, but it went downhill from there as they added more participants. The original version of the query looked like this:
SELECT DISTINCT A.name, A.code, A.description,
A.item_id, A.assignment_id, FI.string0, FI.string1
FROM relationships R, assignments A, format_items FI,
relationships R1, relationships R2, relationships R3,
relationships R4, relationships R5
WHERE R.user_id = 134546
AND R.account_id = 134545
AND R.type_code = 0
AND A.item_id = R.item_id
AND FI.item_id = A.item_id
AND R1.item_id = A.item_id AND R1.status = 5 AND R1.user_id = 137279
AND R2.item_id = A.item_id AND R2.status = 5 AND R2.user_id = 134555
AND R3.item_id = A.item_id AND R3.status = 5 AND R3.user_id = 134546
AND R4.item_id = A.item_id AND R4.status = 5 AND R4.user_id = 137355
AND R5.item_id = A.item_id AND R5.status = 5 AND R5.user_id = 134556
ORDER BY A.name ASC;
(I have changed the names of tables and columns, stripped away extraneous columns and predicates, and eliminated bind variables in order to both protect my client’s confidentiality and make the example easier to understand.) The execution plan looked like this:
Rows Row Source Operation
------- ---------------------------------------------------
642 SORT UNIQUE (cr=23520269 r=34 w=0 time=2759937104 us)
64339104 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=23520269 r=34 w=0 time=1838051782 us)
95184881 NESTED LOOPS (cr=7842642 r=23 w=0 time=907238095 us)
2710288 NESTED LOOPS (cr=2266544 r=23 w=0 time=103840003 us)
317688 NESTED LOOPS (cr=484734 r=11 w=0 time=23494451 us)
50952 NESTED LOOPS (cr=43280 r=10 w=0 time=2688237 us)
4146 NESTED LOOPS (cr=19016 r=3 w=0 time=988374 us)
1831 NESTED LOOPS (cr=13353 r=0 w=0 time=608296 us)
4121 HASH JOIN (cr=7395 r=0 w=0 time=399488 us)
2046 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=7211 r=0 w=0 time=299181 us)
17788 INDEX RANGE SCAN RELATIONSHIPS_N3 (cr=71 r=0 w=0 time=81158 us)(object id 34528)
3634 TABLE ACCESS FULL ASSIGNMENTS (cr=184 r=0 w=0 time=25536 us)
1831 TABLE ACCESS BY INDEX ROWID FORMAT_ITEMS (cr=5958 r=0 w=0 time=163252 us)
1831 INDEX RANGE SCAN FORMAT_ITEMS_N1 (cr=4127 r=0 w=0 time=115113 us)(object id 34316)
4146 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5663 r=3 w=0 time=349554 us)
4264 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=3678 r=0 w=0 time=224390 us)(object id 34345)
50952 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=24264 r=7 w=0 time=1538051 us)
70976 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=8428 r=0 w=0 time=630831 us)(object id 34345)
317688 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=441454 r=1 w=0 time=19584539 us)
1631032 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=108302 r=0 w=0 time=7213989 us)(object id 34345)
2710288 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=1781810 r=12 w=0 time=72015925 us)
4237104 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=643298 r=1 w=0 time=29018764 us)(object id 34345)
92474592 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=5576098 r=0 w=0 time=503165776 us)(object id 34345)
It is no wonder that this query took so long when five participants were selected. With each join to the RELATIONSHIPS table, the Cartesian product gets more and more out of hand. By the final sort to eliminate the duplicates, we have over 64 million rows to reckon with.
上面的执行计划,无论是hash还是nested,都是一个糟糕的执行计划,join了太多的表,返回了太多的行,sort部分一定是处理了非常多的数据,势必效率会非常的差。如果发生mulipass的排序,那么效果将会非常的差。
返回的数据非常多,被过滤的数据也非常多,那么更可以考虑是否可以使用半连接。
This query struck me right away as a good candidate for a nested loops semi-join. We want a list of assignments that meet certain criteria. Whether a specific user has one qualifying relationship record or 10, it makes no difference. We are only concerned with the existence of at least one qualifying relationship record for each of the specified users.
Rewriting the query with EXISTS clauses to semi-join to the RELATIONSHIPS table five times is pretty easy. The challenge here is that the query does have a DISTINCT operator, and this defeats Oracle’s semi-join access paths. Not to worry—an inline view with a NO_MERGE hint saved the day. The revised query looked like:
//因为半连接返回的数据行,不能保证distinct,这个地方还是需要加上distinct,这和上面的例子不同。加上了distinct,就限制了半连接的使用,因此我们使用了hints,使得主查询和子查询不进行合并。
SELECT /*+ NO_MERGE (M) */
DISTINCT M.name, M.code, M.description,
M.item_id, M.assignment_id, M.string0, M.string1
FROM (
SELECT A.name, A.code, A.description,
A.item_id, A.assignment_id, FI.string0, FI.string1
FROM relationships R, assignments A, format_items FI
WHERE R.user_id = 134546
AND R.account_id = 134545
AND R.type_code = 0
AND A.item_id = R.item_id
AND FI.item_id = A.item_id
AND EXISTS
(SELECT 1 FROM relationships R1
WHERE R1.item_id = A.item_id AND R1.status = 5
AND R1.user_id = 137279) //连接一个表,使用exists。
AND EXISTS
(SELECT 1 FROM relationships R2
WHERE R2.item_id = A.item_id AND R2.status = 5
AND R2.user_id = 134555)
AND EXISTS
(SELECT 1 FROM relationships R3
WHERE R3.item_id = A.item_id AND R3.status = 5
AND R3.user_id = 134546)
AND EXISTS
(SELECT 1 FROM relationships R4
WHERE R4.item_id = A.item_id AND R4.status = 5
AND R4.user_id = 137355)
AND EXISTS
(SELECT 1 FROM relationships R5
WHERE R5.item_id = A.item_id AND R5.status = 5
AND R5.user_id = 134556)
) M
ORDER BY M.name ASC;
The execution plan was as follows:
Rows Row Source Operation
------- ---------------------------------------------------
642 SORT UNIQUE (cr=36315 r=89 w=0 time=1107054 us)
1300 VIEW (cr=36315 r=89 w=0 time=1085116 us)
1300 NESTED LOOPS SEMI (cr=36315 r=89 w=0 time=1082232 us)
1314 NESTED LOOPS SEMI (cr=32385 r=89 w=0 time=1002330 us)
1314 NESTED LOOPS SEMI (cr=28261 r=89 w=0 time=904654 us)
1314 NESTED LOOPS SEMI (cr=22822 r=89 w=0 time=737705 us)
1322 NESTED LOOPS SEMI (cr=18730 r=89 w=0 time=651196 us)
1831 NESTED LOOPS (cr=13353 r=89 w=0 time=530670 us)
4121 HASH JOIN (cr=7395 r=89 w=0 time=347584 us)
2046 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=7211 r=0 w=0 time=186820 us)
17788 INDEX RANGE SCAN RELATIONSHIPS_N3 (cr=71 r=0 w=0 time=43770 us)(object id 34528)
3634 TABLE ACCESS FULL ASSIGNMENTS (cr=184 r=89 w=0 time=91899 us)
1831 TABLE ACCESS BY INDEX ROWID FORMAT_ITEMS (cr=5958 r=0 w=0 time=141416 us)
1831 INDEX RANGE SCAN FORMAT_ITEMS_N1 (cr=4127 r=0 w=0 time=100207 us)(object id 34316)
1322 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5377 r=0 w=0 time=92046 us)
2472 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=3664 r=0 w=0 time=61077 us)(object id 34345)
1314 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=4092 r=0 w=0 time=63478 us)
1582 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2647 r=0 w=0 time=40433 us)(object id 34345)
1314 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5439 r=0 w=0 time=133620 us)
11011 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2639 r=0 w=0 time=65312 us)(object id 34345)
1314 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=4124 r=0 w=0 time=75657 us)
2234 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2630 r=0 w=0 time=49373 us)(object id 34345)
1300 TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=3930 r=0 w=0 time=58651 us)
1300 INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2630 r=0 w=0 time=38832 us)(object id 34345)
The execution plan starts out the same as before, with a hash join between RELATIONSHIPS and ASSIGNMENTS followed by a nested loops join to FORMAT_ITEMS. At this point there are 1831 rows in the candidate result set. Whereas the original query does five conventional joins to the RELATIONSHIPS table and the 1831 rows balloon to over 64 million, the revised query does semi-joins instead and whittles the candidate rows down to 1300. Joining 1300 rows is a lot faster than joining 64 million rows, so you can see how the semi-join access path really speeds up this query.
The NO_MERGE hint and the inline view (aliased as “M”) causes Oracle to separate the DISTINCT operation from the bulk of the query. This is how we get around the fact that a DISTINCT operation defeats semi-join access paths. In the second version of the query, Oracle executes the query as if the DISTINCT keyword were not there. Then, as a final step, it applies the DISTINCT operation to the interim result set that it has stored in memory (the output of the VIEW operator).
If you are wondering why the NO_MERGE hint is necessary, recall what we discussed earlier about the heuristic Oracle uses when deciding whether or not to merge a subquery into the main body of a query. Oracle assumes that merging is a good thing and will do it whenever possible. Thus without the NO_MERGE hint Oracle would merge the inline view into the DISTINCT operation (eliminating the inline view), and the semi-join access path would be defeated by the presence of the DISTINCT operation.
This example demonstrates the value of the semi-join access paths. Your gains may not be as spectacular as we have seen here, but hopefully you can see from this discussion how proper use of semi-joins can make a certain class of queries much more efficient. This example also underscores the importance of knowing how to properly use semi-joins. If you did not remember that semi-join access paths are defeated by the DISTINCT operation, you might have changed the example query to use EXISTS clauses and wondered why the query still ran slowly.
|
|