February 2012
M T W T F S S
« Aug    
 12345
6789101112
13141516171819
20212223242526
272829  

connect by的性能分析

Kaya 发表于 os2ora.com

对上一篇文章里提到两条语句
select distinct begin_date, end_date, begin_date + level -1 day  from test connect by begin_date + level -1 <= end_date order by 1,2,3;

select one_date
  from (select start_date + level – 1 one_date
          from (select min(start_date) start_date, max(end_date) end_date
                  from test) test
        connect by start_date + level – 1 < = end_date ) all_date,
       test
where one_date between start_date and end_date;

它们都实现相同的功能,即: 当一个表里面有两个列,分别为begin_date和end_date时,如何针对这个表的每一行,用SQL返回begin_date和end_date之间的所有连续日期。

首先,我们关注第一条语句在没有distinct的情况下返回的行数。

会使问题简单化,假设对于表里的每行数据,end_date – begin_date + 1 的值都是一样的,设为 n 。假设test里总共有 m 行(m>1)。
那么,这时返回的行数是 m * ( power(m,n) – 1 )/( m – 1 ) 。
(
这是一个等比数列,公式为
1 +  m +  power(m,2) + ….  power(m,n-1) = ( 1 – power (m,n)) / (1 – m),
)
这些行可以用一系列的树加以形象的表示,如下所示:

connectby

当 n 和 m 很大时,会使结果集变得非常之大,为 power(m,n).

因此,下面语句在数据量变大的情况下,性能将急剧下降!
select distinct begin_date, end_date, begin_date + level -1 day  from test connect by begin_date + level -1 <= end_date order by 1,2,3;

对于语句2,我们关注以下子查询所返回的行数:
select start_date + level – 1 one_date
          from (select min(start_date) start_date, max(end_date) end_date
                  from test) test
        connect by start_date + level – 1 < = end_date

这时,表实质上只有1行时,存贮的是原来表的最大值和最小值。也就是说,这时 m = 1 。

当m=1时,情况变得不同了。这时,返回的行数刚好就是 n 。
如果以树加以表示的话,这时上面的图形已经退化成一棵只有一个支干的树。

image

因此,当原来的查询m 和 n都很大时,返回的行数为 将为n。

从上面的表述可清晰地看出,语句2在大数据量的情形下将能体现出良好的性能。

从另一方面看,出现distinct的地方,或许就是性能调整应该关注的地方了。

5 comments to connect by的性能分析

  • Yefeng

    谢谢,受教了

  • zhang yong hui

    你好, 看了你这篇文章很有启发, 我想出了另一种sql的写法:
    with abc as
    (SELECT /*+ materialize */ max(end_date) ed,min(begin_date) bd ,max(end_date)-min(begin_date) days FROM test)
    select (select bd from abc)+level from dual connect by level<(select days from abc);

    经过测试, 这个sql的性能比前两个都要好。

    • admin

      嗯,一般情况下,碰到这种问题时,很多人可能有类似的想法,就是用dual来实现。例如为了实现生成一个连续的整数列表,如1,2,3…100,可以简单的:

      1
      
      SELECT level FROM dual connect BY level<100;

      这里当然可以用这种思路,加上一个with 子句就得到了你提到的这种写法。

      不过性能我觉得是不相上下的,你说的性能比前两个好不知是基于什么标准?执行时间?执行计划?

      从执行计划看,我其实还是更倾向于我原来的第二种写法:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      
      SQL> SET autot trace exp
      SQL> SELECT begin_date + level - 1 one_date
        2  FROM (SELECT min(begin_date) begin_date, max(end_date) end_date
        3  FROM test) test
        4  connect BY begin_date + level - 1 < = end_date
        5  ;
      Elapsed: 00:00:00.00
       
      Execution Plan
      ----------------------------------------------------------
      Plan hash value: 928680046
       
      -------------------------------------------------------------------------------------
      | Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
      -------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT             |      |     1 |    18 |     3   (0)| 00:00:01 |
      |*  1 |  CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
      |   2 |   VIEW                       |      |     1 |    18 |     3   (0)| 00:00:01 |
      |   3 |    SORT AGGREGATE            |      |     1 |    18 |            |          |
      |   4 |     TABLE ACCESS STORAGE FULL| TEST |     2 |    36 |     3   (0)| 00:00:01 |
      -------------------------------------------------------------------------------------
       
      Predicate Information (IDENTIFIED BY operation id):
      ---------------------------------------------------
       
         1 - filter("END_DATE">=INTERNAL_FUNCTION("BEGIN_DATE")+LEVEL-1)
       
      Note
      -----
         - dynamic sampling used FOR this statement
       
      SQL> WITH abc AS
        2  (SELECT /*+ materialize */ max(end_date) ed,min(begin_date) bd ,max(end_date)-min(begin_date) days FROM test)
        3  SELECT (SELECT bd FROM abc)+level FROM dual connect BY level<(SELECT days FROM abc);
      Elapsed: 00:00:00.00
       
      Execution Plan
      ----------------------------------------------------------
      Plan hash value: 2352707181
       
      -------------------------------------------------------------------------------------------------------------
      | Id  | Operation                     | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
      -------------------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT              |                             |     1 |       |     5   (0)| 00:00:01 |
      |   1 |  VIEW                         |                             |     1 |     9 |     2   (0)| 00:00:01 |
      |   2 |   TABLE ACCESS FULL           | SYS_TEMP_0FD9D6602_21C39D13 |     1 |    18 |     2   (0)| 00:00:01 |
      |   3 |  TEMP TABLE TRANSFORMATION    |                             |       |       |            |          |
      |   4 |   LOAD AS SELECT              | DUAL                        |       |       |            |          |
      |   5 |    SORT AGGREGATE             |                             |     1 |    18 |            |          |
      |   6 |     TABLE ACCESS STORAGE FULL | TEST                        |     2 |    36 |     3   (0)| 00:00:01 |
      |*  7 |   CONNECT BY WITHOUT FILTERING|                             |       |       |            |          |
      |   8 |    FAST DUAL                  |                             |     1 |       |     2   (0)| 00:00:01 |
      |   9 |    VIEW                       |                             |     1 |     6 |     2   (0)| 00:00:01 |
      |  10 |     TABLE ACCESS FULL         | SYS_TEMP_0FD9D6602_21C39D13 |     1 |    18 |     2   (0)| 00:00:01 |
      -------------------------------------------------------------------------------------------------------------
       
      Predicate Information (IDENTIFIED BY operation id):
      ---------------------------------------------------
       
         7 - filter(LEVEL< (SELECT "DAYS" FROM  (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0" "ED","C1"
                    "BD",INTERNAL_FUNCTION("C2") "DAYS" FROM "SYS"."SYS_TEMP_0FD9D6602_21C39D13" "T1") "ABC"))
       
      Note
      -----
         - dynamic sampling used FOR this statement
       
      SQL>
  • [...] 最后从http://www.os2ora.com/how-to-return-continuous-dates-between-two-dates/发现一个同样的案例,OS2提供了一个非常不错的方法:关于connect by的性能分析详见http://www.os2ora.com/connect-by-performance-tunning/ [...]

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

您也可以使用微博账号登陆