<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    Decode360's Blog

    業精于勤而荒于嬉 QQ:150355677 MSN:decode360@hotmail.com

      BlogJava :: 首頁 :: 新隨筆 :: 聯系 ::  :: 管理 ::
      302 隨筆 :: 26 文章 :: 82 評論 :: 0 Trackbacks

    ??? Tom的這篇文章介紹了Oracle在不同的情況下,使用RBO和CBO對執行計劃的影響。以及在CBO下,對于TABLE的不同大小而產生的變化。在最后還介紹了一下使用現有執行計劃對于條件過濾之后記錄數的多少而形成的不同情況,也回答了我以前的一些困擾。但是具體到底多少記錄占比會使CBO發生變化還是不得而知。另外也可以看一下關于DBMS_STATS包的應用方法,這個包還是很有用的,就是比較復雜一些。

    On Joins and Query Plans ?
    ??? By Tom Kyte

    Our technologist optimizes joins and explains changing plans.

    I know some of the rules about joins and that join 2 here is better than 1. Are there other rules about joins that I should know, particularly which are efficient to use and which are not?

    				1) SELECT distinct D.deptno, D.dname
        FROM dept D, emp E
            WHERE E.deptno = D.deptno
            ORDER BY D.deptno;
    
    2) SELECT D.deptno, D.dname
        FROM dept D
            WHERE EXISTS
                     (
                     SELECT 1
                     FROM emp E
                     WHERE E.deptno = D.deptno
                     )
            ORDER BY D.deptno;
    
    		

    In general, you should phrase the queries in the manner that says it best. If one set of joins were particularly efficient to use in all cases, Oracle would not have implemented the rest of them!

    In general, you use a join when you need data from more than one table in the ultimate SELECT list. Here you need data only from DEPT, so it is unlikely that you would consider a join. You would instead consider using either WHERE EXISTS or WHERE IN. (The cost-based optimizer [CBO] sees them as more or less equivalent—you might, too.)

    I would write the query as either

    				select deptno, dname
      from dept
     where exists 
       ( select NULL 
           from emp 
          where emp.deptno 
             = dept.deptno )
     order by deptno;
    
    		

    or

    				select deptno, dname
      from dept
     where deptno in 
    ( select deptno 
        from emp )
     order by deptno;
    
    		

    In both cases, the optimizer would employ either a semi join, which you cannot specify but the optimizer may perform (it stops joining after the first hit), or an index probe using NESTED LOOPS into EMP to test for row existence.

    This example demonstrates, in general, what happens. When the optimizer detects that DEPT is small and EMP is large, it index-probes the large EMP table for each row in DEPT. On the other hand, if it deems DEPT to be large, the optimizer will do a pure semi join in bulk, with no indexes. I'm using a copy of the EMP and DEPT tables for the test:

    				SQL> create table emp as
      2  select * from scott.emp;
    Table created.
     
    SQL> create table dept as
      2  select * from scott.dept;
    Table created.
     
    SQL> create index emp_deptno_idx
      2  on emp(deptno);
    Index created.
    
    		

    Then I set the first representative statistics. I use SET_TABLE_STATS to make EMP appear large, with 1,000,000 rows, and DEPT appear small, with only 100 rows. Additionally, I tell the optimizer about the index I envision having on EMP(DEPTNO). Because Oracle Database 10g Release 2 computes statistics on an index creation by default, I start by removing any existing statistics on this index and putting in place statistics that might be more representative:

    				
    						SQL> begin
      2   dbms_stats.set_table_stats
      3   ( user,
      4    'EMP',
      5    numrows => 1000000,
      6    numblks => 100000 );
      7   dbms_stats.delete_index_stats
      8   ( user,
      9    'EMP_DEPTNO_IDX' );
     10   dbms_stats.set_index_stats
     11   ( user,
     12    'EMP_DEPTNO_IDX',
     13     numrows => 1000000,
     14     numdist => 10000,
     15     numlblks =>10000 );
     16   dbms_stats.set_column_stats
     17   ( user,
     18    'EMP',
     19    'DEPTNO',
     20     DISTCNT => 10000 );
     21   dbms_stats.set_table_stats
     22   ( user,
     23    'DEPT',
     24     numrows=> 100,
     25     numblks => 100 );
     26  end;
     27  /
    				
    		

    Now I'm ready to see what the optimizer decides to do. I use AUTOTRACE to review the query plans generated for two queries, either of which would be a correct way to ask this question. I'm showing two queries to demonstrate that when developers use the CBO, they don't have to decide on the best way to ask a question—the optimizer recognizes certain constructs and picks the best path possible. This is in contrast to the old rule-based optimizer (RBO), which would view WHERE IN and WHERE EXISTS very differently.

    When using the CBO with a large EMP and a small DEPT, Oracle Database might use a plan such as the one in Listing 1. The optimizer opts to read every row in DEPT and then performs an index range scan—the index probe into the EMP table—to see if the row exists in the other table. Because I have relatively few index probes to perform (about 100 is what the optimizer thinks) and because the EMP table is large, the optimizer uses this plan in both the WHERE IN and WHERE EXISTS cases.

    Code Listing 1: CBO plan with large EMP and small DEPT

    				
    						SQL> set autotrace traceonly explain
    SQL> select deptno, dname
      2    from dept where deptno  in
      3    ( select deptno
      4        from emp )
      5   order by deptno;
     
    Execution Plan
    -----------------------------------------------------------
    Plan hash value: 3383088615
     
    -----------------------------------------------------------
    |  Id |	Operation	| Name	        |   Rows | Bytes  |
    -----------------------------------------------------------
    |   0 |SELECT STATEMENT	|		|   100	 |  3500  |
    |   1 |SORT ORDER BY	|		|   100	 |  3500  |
    |   2 |NESTED LOOPS SEMI|		|   100	 |  3500  | 
    |   3 |TABLE ACCESS FULL| DEPT	        |   100	 |  2200  | 
    |*  4 |INDEX RANGE SCAN	| EMP_DEPTNO_IDX|  1000K |    12M |
    -----------------------------------------------------------
     
    Predicate Information (identified by operation id):
    -----------------------------------------------------------
     
       4 - access("DEPTNO"="DEPTNO")
     
    SQL> select deptno, dname
      2    from dept where exists
      3    ( select null
      4        from emp
      5       where emp.deptno =
      6            dept.deptno )
      7   order by deptno;
     
    Execution Plan
    ------------------------------------------------------------
    Plan hash value: 3383088615
     
    ------------------------------------------------------------
    |  Id |	Operation	 | Name	         |   Rows | Bytes  |
    ------------------------------------------------------------
    |   0 |	SELECT STATEMENT |	         |   100  |  3500  |
    |   1 | SORT ORDER BY	 |		 |   100  |  3500  |
    |   2 | NESTED LOOPS SEMI|		 |   100  |  3500  |
    |   3 | TABLE ACCESS FULL| DEPT	         |   100  |  2200  |
    |*  4 | INDEX RANGE SCAN | EMP_DEPTNO_IDX|  1000K |    12M |
    ------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ------------------------------------------------------------
     
       4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
    				
    		

    Compare that with the RBO plan, shown in Listing 2. When using the RBO, the developer must ask, "What size will EMP and DEPT be?" and, depending on the answer, choose either WHERE IN or WHERE EXISTS to get an "optimal" plan. The RBO comes up with two different plans based solely on how the question was asked, not on how large the amount of data was. This is one of the major advantages of the CBO over the RBO: Developers don't have to know how to write the best query in all cases—which is not to say that they don't need to learn SQL! They just do not have to figure out minutiae such as where they should use WHERE IN versus WHERE EXISTS.

    Code Listing 2: RBO plan

    				
    						SQL> select /*+ RULE */
      2    deptno, dname
      3    from dept where deptno  in
      4    ( select deptno
      5        from emp )
      6   order by deptno;
     
    Execution Plan
    -------------------------------------
    Plan hash value: 1932208647
     
    -------------------------------------
    |  Id |	Operation	 | Name	    |
    -------------------------------------
    |   0 | SELECT STATEMENT |	    |
    |   1 | MERGE JOIN       |	    |
    |   2 | SORT JOIN	 |	    |
    |   3 | TABLE ACCESS FULL| DEPT     |
    |*  4 | SORT JOIN	 |	    |
    |   5 | VIEW		 | VW_NSO_1 |			
    |   6 | SORT UNIQUE	 |          |
    |   7 | TABLE ACCESS FULL| EMP	    |
    -------------------------------------
     
    Predicate Information (identified by operation id):
    -------------------------------------
     
       4 - access("DEPTNO"="$nso_col_1")
           filter("DEPTNO"="$nso_col_1")
     
    Note
    ----------
       - rule based optimizer used (consider using cbo)
     
    SQL> select /*+ RULE */
      2    deptno, dname
      3    from dept where exists
      4    ( select null
      5        from emp
      6       where emp.deptno =
      7            dept.deptno )
      8   order by deptno;
     
    Execution Plan
    ----------------------------------------
    Plan hash value: 4109416194
     
    ----------------------------------------
    | Id | Operation	| Name	       |
    ----------------------------------------
    |  0 | SELECT STATEMENT	|	       |
    |  1 | SORT ORDER BY	|	       |
    |* 2 | FILTER		|	       |
    |  3 | TABLE ACCESS FULL| DEPT 	       |
    |* 4 | INDEX RANGE SCAN	| EMP_DEPTNO_ID|
    ----------------------------------------
     
    Predicate Information (identified by operation id):
    ----------------------------------------
     
       2 - filter( EXISTS (SELECT 0 FROM "EMP" "EMP" WHERE
                    "EMP"."DEPTNO"=:B1))
       4 - access("EMP"."DEPTNO"=:B1)
    				
    		

    Now, back to the CBO. As the amount of data in DEPT increases—and the optimizer decides that it will have to perform significantly more index probes into the EMP table—the query plans change. They go from using NESTED LOOPS and INDEX RANGE SCANS to more-efficient bulk operations. To see this, I tell the optimizer that DEPT is much larger now

    				SQL> begin
      2     dbms_stats.set_table_stats
      3     ( user,
      4      'DEPT',
      5       numrows=> 100000,
      6       numblks => 10000 );
      7  end;
      8  /
    
    		

    and then rerun the queries and review the resulting plans, as shown in Listing 3. As you can see, the optimizer, in both cases, now chooses to perform full scans and a nice big hash semi join—a join that stops after the first match—instead of using an index probe, because it would have to use the index 100,000 times.

    Code Listing 3: CBO plan with larger DEPT

    				
    						SQL> set autotrace traceonly explain
    SQL> select deptno, dname
      2    from dept where deptno  in
      3    ( select deptno
      4        from emp )
      5   order by deptno;
     
    Execution Plan
    -----------------------------------------------------------------
    Plan hash value: 3127359958
    
    -----------------------------------------------------------------
    | Id |	Operation	    | Name	     |	 Rows |  Bytes	|	
    -----------------------------------------------------------------
    |  0 |	SELECT STATEMENT    |		     |   100K |  3417K	|	
    |  1 |  SORT ORDER BY	    |		     |   100K |  3417K	|
    |* 2 |  HASH JOIN SEMI	    |		     |   100K |  3417K	|
    |  3 |  TABLE ACCESS FULL   | DEPT	     |   100K |  2148K	|	
    |  4 |  INDEX FAST FULL SCAN| EMP_DEPTNO_IDX |   1000K|    12M	|
    -----------------------------------------------------------------
    Predicate Information (identified by operation id):
    -----------------------------------------------------------------
     
       2 - access("DEPTNO"="DEPTNO")
     
    SQL> select deptno, dname
      2    from dept where exists
      3    ( select null
      4        from emp
      5       where emp.deptno =
      6            dept.deptno )
      7   order by deptno;
     
    Execution Plan
    ---------------------------------------------------------------
    Plan hash value: 3127359958
     
    ---------------------------------------------------------------
    | Id |	Operation	    | Name	    |	Rows |   Bytes|	
    ---------------------------------------------------------------
    |  0 |	SELECT STATEMENT    |		    |   100K |  3417K |	
    |  1 |  SORT ORDER BY	    |		    |   100K |  3417K | 
    |* 2 |  HASH JOIN SEMI	    |		    |   100K |  3417K | 
    |  3 |  TABLE ACCESS FULL   | DEPT	    |   100K |  2148K |	
    |  4 |  INDEX FAST FULL SCAN| EMP_DEPTNO_IDX|   1000K|    12M |	
    ---------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------------------
     
       2 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
    				
    		

    Another important thing to note about these examples is that the RBO never performed a semi join, as the CBO did. The RBO has a very limited set of access paths and join operations available—it simply doesn't know how to do a semi join and won't use that join operation. Instead it performs operations such as the SORT DISTINCT it did in Listing 2, which is much less efficient than the cost-based plan in this case. Another reason to abandon the RBO in favor of the CBO!

    Why Does My Plan Change?

    I have noticed that gathering statistics against tables where none of the data has changed can cause the query plans against those tables to change. For example, one day I gather statistics, run some queries, and save the query plans in a report. The next day, I gather statistics (but I have not modified any of the data in any table) and I discover that the plans are different. What could cause that? The new query plans were better, but I would like to understand how this could be.

    This change is likely due to the default value of the METHOD_OPT parameter used with DBMS_STATS. In Oracle Database 10g, the METHOD_OPT parameter defaults to a SIZE AUTO. After you ran a query, the database remembered the predicates and updated a dictionary table, SYS.COL_USAGE$. Then, the next time you ran DBMS_STATS to gather statistics on these tables, DBMS_STATS queried that table to find out what columns should have histograms collected automatically, based on past query workload. It looked at your predicates and said, "Hmm, these columns are candidates for histograms based on the queries the end users have been running."

    You can see this easily with a relatively small example. I start out by creating a small table with some skewed data in a column named ID and gathering the default set of statistics on it. The data in the ID column is such that for the values 0 through 4, about 20 percent of the table will be retrieved, but for the value 99, only a single row will be returned. I'm using this exaggerated example just to see the plans change:

    				SQL> create table t
      2  as
      3  select mod(rownum,5) id, a.*
      4    from all_objects a;
    Table created.
    SQL> update t
      2     set id = 99
      3   where rownum = 1;
    1 row updated.
     
    SQL> create index t_idx on t(id);
    Index created.
     
    SQL> begin
      2   dbms_stats.gather_table_stats
      3   ( user, 'T' );
      4  end;
      5  /
    
    SQL> select column_name, count(*)
      2    from user_tab_histograms
      3   where table_name = 'T'
      4     and column_name = 'ID'
      5   group by column_name;
     
    COLUMN_NAME   COUNT(*)
    -----------   ------------
    ID                  2
    
    		

    So, right now, the ID column doesn't have complete histograms, by default—just two buckets, as shown by the query against USER_TAB_HISTOGRAMS. To understand the skewed nature of the data, I need a histogram with more than just two buckets; the two entries in USER_TAB_HISTOGRAMS tell the optimizer only the high and low values right now.

    The optimizer knows the high value (99), the low value (0), the number of distinct values (6 in this case), and the number of rows in my table T (50,119 when I tested). Given those facts, the optimizer will believe that WHERE ID = <value> will return about 50,119/6, or 8,353 rows. Sure enough, when I query with ID=1 or ID=99, I observe the results in Listing 4.

    Code Listing 4: CBO plans when ID=1 and ID=99

    				SQL> set autotrace traceonly explain
    SQL> select *
      2    from t
      3   where id = 1;
     
    ----------------------------------------------------------------------------
    |  Id |	Operation	 | Name	|  Rows	|  Bytes|   Cost (%CPU)	|  Time	   |
    ----------------------------------------------------------------------------
    |   0 |	SELECT STATEMENT |	|  8353	|   783K|   163	(2)	| 00:00:02 |
    |*  1 | TABLE ACCESS FULL| T	|  8353	|   783K|   163	(2)	| 00:00:02 |
    ----------------------------------------------------------------------------
     
    SQL> select *
      2    from t
      3   where id = 99;
     
    Execution Plan
    ----------------------------------------------------------------------------
    Plan hash value: 1601196873
     
    ----------------------------------------------------------------------------
    |  Id |	Operation	 | Name	|   Rows |  Bytes|  Cost (%CPU)	| Time	   |
    ----------------------------------------------------------------------------
    |   0 |	SELECT STATEMENT |	|   8353 |   783K|   163 (2)	| 00:00:02 |
    |*  1 | TABLE ACCESS FULL| T	|   8353 |   783K|   163 (2)	| 00:00:02 |
    ----------------------------------------------------------------------------
    
    		

    Now, I immediately gather statistics, using the same command as before:

    				SQL> begin
      2   dbms_stats.gather_table_stats
      3   ( user, 'T' );
      4  end;
      5  /
    SQL> select column_name, count(*)
      2    from user_tab_histograms
      3   where table_name = 'T'
      4     and column_name = 'ID'
      5   group by column_name;
     
    COLUMN_NAME   COUNT(*)
    ------------  -----------
    ID                  5
    
    		

    Note, however, that I have more than two buckets for my histograms. DBMS_STATS, using the AUTO setting, gathered more information here. If I were to query the SYS.COL_USAGE$ table right now, I would discover that a new row was added, indicating that I have some queries in my system that use equality predicates against this particular database column. That is the "magic" that caused DBMS_STATS to change how it gathered statistics and caused a dramatic change in my resulting query plans, as shown in Listing 5.

    Code Listing 5: New CBO plans when ID=1 and ID=99

    				
    						
    								SQL> select *
      2    from t;
      3   where id = 1;
     
    Execution Plan
    ----------------------------------------------------------------------------
    Plan hash value: 1601196873
    
    ----------------------------------------------------------------------------
    | Id |	Operation	 | Name	|  Rows	|  Bytes|  Cost (%CPU)	| Time	   |
    ----------------------------------------------------------------------------
    |  0 |	SELECT STATEMENT |	|  10260|   961K|   164	(2)	| 00:00:02 |
    |* 1 |  TABLE ACCESS FULL| T	|  10260|   961K|   164	(2)	| 00:00:02 |
    ----------------------------------------------------------------------------
     
    SQL> select *
      2    from t
      3   where id = 99;
     
    Execution Plan
    ----------------------------------------------------------------------------------------
    Plan hash value: 470836197
     
    ----------------------------------------------------------------------------------------
    | Id |  Operation	           | Name    | Rows |  Bytes|  Cost (%CPU)  |  Time    |
    ----------------------------------------------------------------------------------------
    |  0 |  SELECT STATEMENT	   |	     | 1    |    96	|   2	(0) | 00:00:01 |
    |  1 |  TABLE ACCESS BY INDEX ROWID|	T    | 1    |    96	|   2	(0) | 00:00:01 |
    |* 2 |  INDEX RANGE SCAN	   |	T_IDX| 1    |   	|   1	(0) | 00:00:01 |
    ----------------------------------------------------------------------------------------
    				
    		

    Note how the cardinalities are very different—8,353 changed to either 10,260 or 1. The resulting change in estimated cardinality led to a change in the overall cost of the first query plan and resulted in an entirely different (but better) query plan in the second case.

    This is a significant fact for a DBA to be aware of. As the queries submitted by your end users change over time, the AUTO feature of statistics gathering could change what statistics are gathered over time as well. Unless you understand this feature, enabled by default in Oracle Database 10g, it can look as if inexplicable magic is happening in your database.


    Tom Kyte (thomas.kyte@oracle.com) has worked for Oracle since 1993. He is a vice president in the Oracle Public Sector group and the author of Expert Oracle Database Architecture: 9 i and 10 g Programming Techniques and Solutions (Apress, 2005) and Effective Oracle by Design (Oracle Press, 2003), among others.




    -The End-

    posted on 2009-03-28 21:48 decode360-3 閱讀(239) 評論(0)  編輯  收藏 所屬分類: Oracle
    主站蜘蛛池模板: 精品无码一区二区三区亚洲桃色| 国产成人亚洲综合无| 无码一区二区三区免费视频| 免费的黄色网页在线免费观看| 亚洲国产精品久久久天堂| 男男AV纯肉无码免费播放无码 | 国产成人+综合亚洲+天堂| 久久精品国产亚洲综合色| 无码人妻一区二区三区免费手机| 色妞www精品视频免费看| 亚洲精品国产成人专区| 国产成人免费福利网站| 亚洲国产综合人成综合网站| 国产永久免费高清在线| 亚洲AV无码专区在线电影成人| 亚洲色大成网站www永久一区| 国产成人午夜精品免费视频| 一级毛片在线免费视频| 亚洲一区中文字幕在线观看| 国产成人精品日本亚洲专区 | 美女无遮挡拍拍拍免费视频| 亚洲国产成人九九综合| 久久亚洲中文字幕精品一区四 | 国产精品成人四虎免费视频| 久9这里精品免费视频| 黄色网址在线免费观看| 国产成人精品日本亚洲直接| 久久精品国产亚洲| 亚洲欧洲日产国码高潮αv| 国产三级在线观看免费| 四虎国产精品永久免费网址| 免费国产高清毛不卡片基地| 亚洲av成人一区二区三区| 久久精品国产亚洲av成人| 亚洲日本中文字幕一区二区三区| 成人毛片免费在线观看| 0588影视手机免费看片| 黄色片免费在线观看| 一级毛片试看60分钟免费播放| 亚洲中文无码mv| 亚洲六月丁香六月婷婷色伊人 |