SQL Server SQL語句調優技巧


        www.InnovateDigital.com 整理


      通過例子和解析計劃,本文展示了在Microsoft SQL Server上提高查詢效率有效的一些技巧。在編程中有很多小提示和技巧。了解這些技巧可以擴展你在性能優化上的可用機能。在這部分里我們所有的例子都選擇使用Microsoft SHOWPLAN_ALL輸出,因為它更緊湊并且展示典型的信息。(Sybase的查詢計劃基本與此相同,可能包含其它一些信息)大部分的例子都是要么基于PUBS數據庫,要么基于標準系統表的。我們在PUBS數據庫中對用到的表進行了很大擴充,對很多表增加了好幾萬行。

      子查詢優化


       一條好的值得稱贊的規則是盡量用連接代替所有的子查詢。優化器有時可以自動將子查詢“扁平化”,并且用常規或外連接代替。但那樣也不總是有效。明確的連接對選擇表的順序和找到最可能的計劃給出了更多的選項。當你優化一個特殊查詢時,了解一下是否去掉自查詢可產生很大的差異。

      示例

      下面查詢選擇了pubs數據庫中所有表的名字,以及每個表的聚集索引(如果存在)。如果沒有聚集索引,表名仍然顯示在列表中,在聚集索引列中顯示為虛線。兩個查詢返回同樣的結果集,但第一個使用了一個子查詢,而第二個使用一個外連接時。比較Microsoft SQL Server產生的查詢計劃

      SUBQUERY SOLUTION

      ----------------------

      SELECT st.stor_name AS 'Store',

      (SELECT SUM(bs.qty)

      FROM big_sales AS bs

      WHERE bs.stor_id = st.stor_id), 0)

      AS 'Books Sold'

      FROM stores AS st

      WHERE st.stor_id IN

      (SELECT DISTINCT stor_id

      FROM big_sales)

JOIN SOLUTION

----------------------

SELECT st.stor_name AS 'Store',

SUM(bs.qty) AS 'Books Sold'

FROM stores AS st

JOIN big_sales AS bs

ON bs.stor_id = st.stor_id

WHERE st.stor_id IN

(SELECT DISTINCT stor_id

FROM big_sales)

GROUP BY st.stor_name

      SUBQUERY SOLUTION

      ----------------------

      SQL Server parse and compile time:

          CPU time = 28 ms

          elapsed time = 28 ms

      SQL Server Execution Times:

          CPU time = 145 ms

          elapsed time = 145 ms

      Table 'big_sales'. Scan count 14, logical reads

      1884, physical reads 0, read-ahead reads 0.

      Table 'stores'. Scan count 12, logical reads 24,
      physical reads 0, read-ahead reads 0.

JOIN SOLUTION

----------------------

SQL Server parse and compile time:

    CPU time = 50 ms

    elapsed time = 54 ms

SQL Server Execution Times:

    CPU time = 109 ms

    elapsed time = 109 ms

Table 'big_sales'. Scan count 14, logical reads

966, physical reads 0, read-ahead reads 0.

Table 'stores'. Scan count 12, logical reads 24,
physical reads 0, read-ahead reads 0.

      不必更深探索,我們可以看到在CPU和總的實耗時間方面連接更快,僅需要子查詢方案邏輯讀的一半。此外,這兩種情況伴隨著相同的結果集,雖然排序的順序不同,這是因為連接查詢(由于它的GROUP BY子句)有一個隱含的ORDER BY:

      Store Books Sold

      -------------------------------------------------

      Barnum's 154125

      Bookbeat 518080

      Doc-U-Mat: Quality Laundry and Books 581130

      Eric the Read Books 76931

      Fricative Bookshop 259060

      News & Brews 161090

      (6 row(s) affected)


      Store Books Sold

      -------------------------------------------------

      Eric the Read Books 76931

      Barnum's 154125

      News & Brews 161090

      Doc-U-Mat: Quality Laundry and Books 581130

      Fricative Bookshop 259060

      Bookbeat 518080

      (6 row(s) affected)

      查看這個子查詢方法展示的查詢計劃:

      |--Compute Scalar(DEFINE:([Expr1006]=isnull([Expr1004], 0)))

      |--Nested Loops(Left Outer Join, OUTER REFERENCES:([st].[stor_id]))

      |--Nested Loops(Inner Join, OUTER REFERENCES:([big_sales].[stor_id]))

         | |--Stream Aggregate(GROUP BY:([big_sales].[stor_id]))

            | | |--Clustered Index Scan(OBJECT:([pubs].[dbo].[big_sales].

            [UPKCL_big_sales]), ORDERED FORWARD)

         | |--Clustered Index Seek(OBJECT:([pubs].[dbo].[stores].[UPK_storeid]

      AS [st]),

      SEEK:([st].[stor_id]=[big_sales].[stor_id]) ORDERED FORWARD)

       |--Stream Aggregate(DEFINE:([Expr1004]=SUM([bs].[qty])))

      |--Clustered Index Seek(OBJECT:([pubs].[dbo].[big_sales].

        [UPKCL_big_sales] AS [bs]),

      SEEK:([bs].[stor_id]=[st].[stor_id]) ORDERED FORWARD)

      反之,求和查詢操作我們可以得到:

      |--Stream Aggregate(GROUP BY:([st].[stor_name])

        DEFINE:([Expr1004]=SUM([partialagg1005])))

      |--Sort(ORDER BY:([st].[stor_name] ASC))

      |--Nested Loops(Left Semi Join, OUTER REFERENCES:([st].[stor_id]))

      |--Nested Loops(Inner Join, OUTER REFERENCES:([bs].[stor_id]))

        | |--Stream Aggregate(GROUP BY:([bs].[stor_id])

          DEFINE:([partialagg1005]=SUM([bs].[qty])))

             | | |--Clustered Index Scan(OBJECT:([pubs].[dbo].[big_sales].

            [UPKCL_big_sales] AS [bs]), ORDERED FORWARD)

        | |--Clustered Index Seek(OBJECT:([pubs].[dbo].[stores].

            [UPK_storeid] AS [st]),

        SEEK:([st].[stor_id]=[bs].[stor_id]) ORDERED FORWARD)

      |--Clustered Index Seek(OBJECT:([pubs].[dbo].[big_sales].

          [UPKCL_big_sales]),

        SEEK:([big_sales].[stor_id]=[st].[stor_id]) ORDERED FORWARD)


      使用連接是更有效的方案。它不需要額外的流聚合(stream aggregate),即子查詢所需在big_sales.qty列的求和。



 

      UNION vs UNION ALL


      無論何時盡可能用UNION ALL 代替UNION。其中的差異是因為UNION有排除重復行并且對結果進行排序的副作用,而UNION ALL不會做這些工作。選擇無重復行的結果需要建立臨時工作表,用它排序所有行并且在輸出之前排序。(在一個select distinct 查詢中顯示查詢計劃將發現存在一個流聚合,消耗百分之三十多的資源處理查詢)。當你確切知道你得需要時,可以使用UNION。但如果你估計在結果集中沒有重復的行,就使用UNION ALL吧。它只是從一個表或一個連接中選擇,然后從另一個表中選擇,附加在第一條結果集的底部。UNION ALL不需要工作表和排序(除非其它條件引起的)。在大部分情況下UNION ALL更具效率。一個有潛在危險的問題是使用UNION會在數據庫中產生巨大的泛濫的臨時工作表。如果你期望從UNION查詢中獲得大量的結果集時,這就可能發生。

      示例


      下面的查詢是選擇pubs數據庫中的表sales的所有商店的ID,也選擇表big_sales中的所有商店的ID,這個表中我們加入了 70,000多行數據。在這兩個方案間不同之處僅僅是UNION 與UNION ALL的使用比較。但在這個計劃中加入ALL關鍵字產生了三大不同。第一個方案中,在返回結果集給客戶端之前需要流聚合并且排序結果。第二個查詢更有效率,特別是對大表。在這個例子中兩個查詢返回同樣的結果集,雖然順序不同。在我們的測試中有兩個臨時表。你的結果可能會稍有差異。
      UNION SOLUTION

      -----------------------

      UNION ALL SOLUTION

      -----------------------

      SELECT stor_id FROM big_sales

      UNION

      SELECT stor_id FROM sales

      ----------------------------

      SELECT stor_id FROM big_sales

      UNION ALL

      SELECT stor_id FROM sales

      ----------------------------

      |--Merge Join(Union)

       |--Stream Aggregate(GROUP BY:

      ([big_sales].[stor_id]))

      | |--Clustered Index Scan

      (OBJECT:([pubs].[dbo].

      [big_sales].

      [UPKCL_big_sales]),

      ORDERED FORWARD)

      |--Stream Aggregate(GROUP BY:

      ([sales].[stor_id]))

       |--Clustered Index Scan

       (OBJECT:([pubs].[dbo].

       [sales].[UPKCL_sales]),

       ORDERED FORWARD)

      |--Concatenation

      |--Index Scan

      (OBJECT:([pubs].[dbo].

       [big_sales].[ndx_sales_ttlID]))

      |--Index Scan

      (OBJECT:([pubs].[dbo].

      [sales].[titleidind]))

      UNION SOLUTION

      -----------------------

      Table 'sales'. Scan count 1, logical

      reads 2, physical reads 0,

      read-ahead reads 0.

      Table 'big_sales'. Scan count 1,

      logical

      reads 463, physical reads 0,

      read-ahead reads 0.

      UNION ALL SOLUTION

      -----------------------

      Table 'sales'. Scan count 1, logical

      reads 1, physical reads 0,

      read-ahead reads 0.

      Table 'big_sales'. Scan count 1,

      logical

      reads 224, physical reads 0,

      read-ahead reads 0.


      雖然在這個例子的結果集是可互換的,你可以看到UNION ALL語句比UNION語句少消耗一半的資源。所以應當預料你的結果集并且確定已經沒有重復時,使用UNION ALL子句。



      函數和表達式約束索引


      當你在索引列上使用內置的函數或表達式時,優化器不能使用這些列的索引。盡量重寫這些條件,在表達式中不要包含索引列。

      示例

      你應該幫助SQL Server移除任何在索引數值列周圍的表達式。下面的查詢是從表jobs通過唯一的聚集索引的唯一鍵值選擇出的一行。如果你在這個列上使用表達式,這個索引就不起作用了。但一旦你將條件’job_id-2=0’ 該成‘job_id=2’,優化器將在聚集索引上執行seek操作。


      QUERY WITH SUPPRESSED INDEX

      -----------------------

      OPTIMIZED QUERY USING INDEX

      -----------------------

      SELECT *

      FROM jobs

      WHERE (job_id-2) = 0

      SELECT *

      FROM jobs

      WHERE job_id = 2

      |--Clustered Index Scan(OBJECT:

      ([pubs].[dbo].[jobs].

      [PK__jobs__117F9D94]),

      WHERE:(Convert([jobs].[job_id])-

      2=0))

      |--Clustered Index Seek(OBJECT:

      ([pubs].[dbo].[jobs].

      [PK__jobs__117F9D94]),

      SEEK:([jobs].[job_id]=Convert([@1]))

      ORDERED FORWARD)

      Note that a SEEK is much better than a SCAN,

      as in the previous query.


      下面表中列出了多種不同類型查詢示例,其被禁止使用列索引,同時給出改寫的方法,以獲得更優的性能。



      QUERY WITH SUPPRESSED INDEX

      ---------------------------------------

      OPTIMIZED QUERY USING INDEX

      --------------------------------------

      DECLARE @job_id VARCHAR(5)

      SELECT @job_id = ‘2’

      SELECT *

      FROM jobs

      WHERE CONVERT( VARCHAR(5),

      job_id ) = @job_id

      -------------------------------

      DECLARE @job_id VARCHAR(5)

      SELECT @job_id = ‘2’

      SELECT *

      FROM jobs

      WHERE job_id = CONVERT(

      SMALLINT, @job_id )

      -------------------------------

      SELECT *

      FROM authors

      WHERE au_fname + ' ' + au_lname

      = 'Johnson White'

      -------------------------------

      SELECT *

      FROM authors

      WHERE au_fname = 'Johnson'

      AND au_lname = 'White'

      -------------------------------

      SELECT *

      FROM authors

      WHERE SUBSTRING( au_lname, 1, 2 ) = 'Wh'

      -------------------------------

      SELECT *

      FROM authors

      WHERE au_lname LIKE 'Wh%'

      -------------------------------

      CREATE INDEX employee_hire_date

      ON employee ( hire_date )

      GO

      -- Get all employees hired

      -- in the 1st quarter of 1990:

      SELECT *

      FROM employee

      WHERE DATEPART( year, hire_date ) = 1990

      AND DATEPART( quarter, hire_date ) = 1

      -------------------------------

      CREATE INDEX employee_hire_date

      ON employee ( hire_date )

      GO

      -- Get all employees hired

      -- in the 1st quarter of 1990:

      SELECT *

      FROM employee

      WHERE hire_date >= ‘1/1/1990’

      AND hire_date < ‘4/1/1990’

      -------------------------------

      -- Suppose that hire_date may

      -- contain time other than 12AM

      -- Who was hired on 2/21/1990?

      SELECT *

      FROM employee

      WHERE CONVERT( CHAR(10),

      hire_date, 101 ) = ‘2/21/1990’

      -- Suppose that hire_date may

      -- contain time other than 12AM

      -- Who was hired on 2/21/1990?

      SELECT *

      FROM employee

      WHERE hire_date >= ‘2/21/1990’

      AND hire_date < ‘2/22/1990’




      SET NOCOUNT ON


      使用SET NOCOUNT ON 提高T-SQL代碼速度的現象使SQL Server開發者和數據庫系統管理者驚訝難解。你可能已經注意到成功的查詢返回了關于受影響的行數的系統信息。在很多情況下,你不需要這些信息。這個 SET NOCOUNT ON命令允許你禁止所有在你的會話事務中的子查詢的信息,直到你發出SET NOCOUNT OFF。
      這個選項不只在于其輸出的裝飾效果。它減少了從服務器端到客戶端傳遞的信息量。因此,它幫助降低了網絡通信量并提高了你的事務整體響應時間。傳遞單個信息的時間可以忽略,但考慮到這種情況,一個腳本在一個循環里執行一些查詢并且發送好幾千字節無用的信息給用戶。

      為做個例子,一個文件含T-SQL批處理,其在big_sales表插入了9999行。


      -- Assumes the existence of a table called BIG_SALES, a copy of pubs..sales

      SET NOCOUNT ON

      DECLARE @separator VARCHAR(25),

      @message VARCHAR(25),

      @counter INT,

      @ord_nbr VARCHAR(20),

      @order_date DATETIME,

      @store_nbr INT,

      @qty_sold INT,

      @terms VARCHAR(12),

      @title CHAR(6),

      @starttime DATETIME

      SET @STARTTIME = GETDATE()

      SELECT @counter = 0,

      @separator = REPLICATE( '-', 25 )

      WHILE @counter < 9999

      BEGIN

      SET @counter = @counter + 1

      SET @ord_nbr = 'Y' + CAST(@counter AS VARCHAR(5))

      SET @order_date = DATEADD(hour, (@counter * 8), 'Jan 01 1999')

      SET @store_nbr =

      CASE WHEN @counter < 999 THEN '6380'

      WHEN @counter BETWEEN 1000 AND 2999 THEN '7066'

      WHEN @counter BETWEEN 3000 AND 3999 THEN '7067'

      WHEN @counter BETWEEN 4000 AND 6999 THEN '7131'

      WHEN @counter BETWEEN 7000 AND 7999 THEN '7896'

      WHEN @counter BETWEEN 8000 AND 9999 THEN '8042'

      ELSE '6380'

      END

      SET @qty_sold =

      CASE WHEN @counter BETWEEN 0 AND 2999 THEN 11

      WHEN @counter BETWEEN 3000 AND 5999 THEN 23

      ELSE 37

      END

      SET @terms =

      CASE WHEN @counter BETWEEN 0 AND 2999 THEN 'Net 30'

      WHEN @counter BETWEEN 3000 AND 5999 THEN 'Net 60'

      ELSE 'On Invoice'

      END

      -- SET @title = (SELECT title_id FROM big_sales WHERE qty = (SELECT MAX(qty)

      FROM big_sales))

      SET @title =

      CASE WHEN @counter < 999 THEN 'MC2222'

      WHEN @counter BETWEEN 1000 AND 1999 THEN 'MC2222'

      WHEN @counter BETWEEN 2000 AND 3999 THEN 'MC3026'

      WHEN @counter BETWEEN 4000 AND 5999 THEN 'PS2106'

      WHEN @counter BETWEEN 6000 AND 6999 THEN 'PS7777'

      WHEN @counter BETWEEN 7000 AND 7999 THEN 'TC3218'

      ELSE 'PS1372'

      END

      -- PRINT @separator

      -- SELECT @message = STR( @counter, 10 ) -- + STR( SQRT( CONVERT( FLOAT,

      @counter ) ), 10, 4 )

      -- PRINT @message

      BEGIN TRAN

      INSERT INTO [pubs].[dbo].[big_sales]([stor_id], [ord_num], [ord_date],

      [qty], [payterms], [title_id])

      VALUES(@store_nbr, CAST(@ord_nbr AS CHAR(5)), @order_date, @qty_sold,

      @terms, @title)

      COMMIT TRAN

      END

      SET @message = CAST(DATEDIFF(ms, @starttime, GETDATE()) AS VARCHAR(20))

      PRINT @message

      /*

      TRUNCATE table big_sales

      INSERT INTO big_sales

      SELECT * FROM sales

      SELECT title_id, sum(qty)

      FROM big_sales

      group by title_id

      order by sum(qty)

      SELECT * FROM big_sales

      */


      當帶SET NOCOUNT OFF命令運行,實耗時間是5176毫秒。當帶SET NOCOUNT ON命令運行,實耗時間是1620毫秒。如果不需要輸出中的行數信息,考慮在每一個存儲過程和腳本開始時增加SET NOCOUNT ON 命令將。

      TOP 和 SET ROWCOUNT


      SELECT 語句中的TOP子句限制單個查詢返回的行數,而SET ROWCOUNT限制所有后續查詢影響的行數。在很多編程任務中這些命令提供了高效率。
      SET ROWCOUNT在SELECT,INSERT,UPDATE OR DELETE語句中設置可以被影響的最大行數。這些設置在命令執行時馬上生效并且只影響當前的會話。為了移除這個限制執行SET ROWCOUNT 0。
一些實際的任務用TOP or SET ROWCOUNT比用標準的SQL命令對編程是更有效率的。讓我們在幾個例子中證明:


      TOP n

      在幾乎所有的數據庫中最流行的一個查詢是請求一個列表中的前N項。在 pubs數據庫案例中,我們可以查找銷售最好CD的前五項。比較用TOP,SET ROWCOUNT和使用ANSI SQL的三種方案。


      純 ANSI SQL:

      Select title,ytd_sales

      From titles a

      Where (select count(*)

      From titles b

      Where b.ytd_sales>a.ytd_sales

      )<5

      Order by ytd_sales DESC

      這個純ANSI SQL方案執行一個效率可能很低的關聯子查詢,特別的在這個例子中,在ytd_sales上沒有索引支持。另外,這個純的標準SQL命令沒有過濾掉在ytd_sales的空值,也沒有區別多個CD間有關聯的情況。

 


      使用 SET ROWCOUNT:

      SET ROWCOUNT 5

      SELECT title, ytd_sales

      FROM titles

      ORDER BY ytd_sales DESC

      SET ROWCOUNT 0

      使用 TOP n:

      SELECT TOP 5 title, ytd_sales

      FROM titles

      ORDER BY ytd_sales DESC

      第二個方案使用SET ROWCOUNT來停止SELECT查詢,而第三個方案是當它找到前五行時用TOP n來停止。在這種情況下,在獲得結果之前我們也要有一個ORDER BY子句強制對整個表進行排序。兩個查詢的查詢計劃實際上是一樣的。然而,TOP優于SET ROWCOUNT的關鍵點是SET必須處理ORDER BY子句所需的工作表,而TOP 不用。


      在一個大表上,我們可以在ytd_sales上創建一個索引以避免排序。查詢將使用該索引找到前5行并停止。與第一個方案相比較,其掃描了整個表,并對每一行執行了一個關聯子查詢。在小表上,性能的差異是很小的。但是在一個大表上,第一個方案的處理時間可能是數個小時,而后兩個方法是數秒。


      當確定查詢需要時,請考慮是否只需要其中幾行,如果是,使用TOP子句將節約大量時間。