1、11. connect by 用法总结 .2一、树查询(递归查询) 2二、列转行 sys_connect_by_path()42.分析函数总结 .61.分析函数(OVER) 72.分析函数 2(Rank, Dense_rank, row_number) 93.分析函数 3(Top/Bottom N、First/Last、NTile) 94.窗口函数 .115.报表函数 .1421.connect by 用法总结一、树查询(递归查询 )1.作用对于 oracle 进行简单树查询(递归查询)列转行2.基本语法select . from where :过滤条件,用于对返回的所有记录进行过滤。star
2、t with :查询结果重起始根结点的限定条件。connect by ; :连接条件1)例子:select num1,num2 from tablestart with num2 = 1008 connect by num2 = prior num1 ;2)解释:start with:用来标识哪个节点作为查找树型结构的根节点。若该子句被省略,则表示所有满足查询条件的行作为根节点。prior: 位置很重要(自我总结,和父在一起 则自底向上,即查父 和子在一起 则自顶向下 查子)例子原始数据 num1 为父 num2 为子3看下面的图 1. CONNECT_BY_ROOT 返回当前节点的最顶端节点
3、。2. CONNECT_BY_ISLEAF 判断是否为叶子节点,是 1,不是 0。3. LEVEL 伪列表示节点深度。4. SYS_CONNECT_BY_PATH 函数显示详细路径,并用“/”分隔。4二、列转行 sys_connect_by_path()这个函数使用之前必须先建立一个树,否则无用sys_connect_by_path(字段名, 2 个字段之间的连接符号)with tmp_a as (select 1 a,0 p from dualunion all select 2,1 from dualunion all select 3,1 from dualunion all selec
4、t 4,3 from dualunion all select 5,2 from dualunion all select 6,5 from dual)- 子全部显示 根子 level 代表级别select a,p,sys_connect_by_path(a,-),level from tmp_astart with a = 1connect by p = prior a5- 2 和 2 的所有下级去掉 根子 (开始就要去掉)select a,p,sys_connect_by_path(a,-) from tmp_astart with p = 1 and a 子 (connect 时去掉)s
5、elect a,p,sys_connect_by_path(a,-) from tmp_astart with a = 1connect by p = prior a and p 子 (where 中去掉)select a,p,sys_connect_by_path(a,-) from tmp_awhere p 子with tmp_tab as (select 中国 s,null b from dual union all select 广东 s,中国 b from dual union all select 湖南 s,中国 b from dual union all select 衡阳 s,
6、湖南 b from dual union all select 广州 s,广东 b from dual union all select 衡东 s,衡阳 b from dual )select max(sys_connect_by_path(s,/) from tmp_tabstart with s = 湖南connect by prior s = b 62.分析函数总结一、统计方面:Sum( ) Over (Partition by Order by )Sum( ) Over (Partition by Order by Rows Between Preceding And Followin
7、g)Sum( ) Over (Partition by Order by Rows Between Preceding And Current Row)Sum( ) Over (Partition by Order by Range Between Interval Day PrecedingAnd Interval Day Following )二、排列方面:Rank() Over (Partition by Order by Nulls First/Last)Dense_rank() Over (Patition by Order by Nulls First/Last)Row_numbe
8、r() Over (Partitionby Order by Nulls First/Last)Ntile( ) Over (Partition by Order by )三、最大值/最小值查找方面:Min( )/Max( ) Keep (Dense_rank First/Last Partition by Order by )四、首记录/末记录查找方面:First_value / Last_value(Sum( ) Over (Patition by Order by Rows Between Preceding And Following )五、相邻记录之间比较方面:Lag(Sum( ),
9、 1) Over(Patition by Order by )71.分析函数(OVER)一.分析函数语法:FUNCTION_NAME(,.)OVER()例:sum(sal) over (partition by deptno order by ename) new_aliassum:函数名(sal):参数 03 个参数 可以是表达式Over:关键字partition by :(可选)分区order by :(可选)LAG 和 LEAD 需,AVG 不需要,如果使用排序的开窗函数时,必须加 1)FUNCTION 子句26 个分析函数,按功能分 5 类分析函数分类1.等级(ranking)函数:
10、用于寻找前 N 种查询2.开窗(windowing)函数:用于计算不同的累计,如 SUM,COUNT,AVG,MIN,MAX 等,作用于数据的一个窗口上3.制表(reporting)函数:与开窗函数同名,作用于一个分区或一组上的所有列(制表与开窗的区别:制表的 OVER 语句上少一个 ORDER BY 子句)4.LAG,LEAD 函数: 可在结果集中向前或向后检索值,为了避免数据的自连接,它们是非常用用的.5.VAR_POP,VAR_SAMP,STDEV_POPE 及线性的衰减函数:计算任何未排序分区的统计值2)PARTITION 子句分组3)ORDER BY 子句分析函数中 ORDER BY
11、 的存在将添加一个默认的开窗子句,这意味着计算中所使用的行的集合是当前分区中当前行和前面所有行,没有 ORDER BY 时,默认的窗口是全部的分区。在 Order by 子句后可以添加 nulls last,如:order by comm desc nulls last 表示排序时忽略 comm 列为空的行. 二、分析函数简单实例:按区域查找 2001 年度订单总额占区域订单总额 20%以上的客户【1】测试数据:8SQL select * from orders_tmp;CUST_NBR REGION_ID SALESPERSON_ID YEAR MONTH TOT_ORDERS TOT_SA
12、LES- - - - - - -11 7 11 2001 7 2 122044 5 4 2001 10 2 378027 6 7 2001 2 3 375010 6 8 2001 1 2 2169110 6 7 2001 2 3 42624【2】测试语句:SQL select o.cust_nbr customer,2 o.region_id region,3 sum(o.tot_sales) cust_sales,4 sum(sum(o.tot_sales) over(partition by o.region_id) region_sales5 from orders_tmp o6 whe
13、re o.year = 20017 group by o.region_id, o.cust_nbr;CUSTOMER REGION CUST_SALES REGION_SALES- - - -4 5 37802 378027 6 3750 6806510 6 64315 6806511 7 12204 122043、分析函数 OVER 解析:请注意上面的绿色高亮部分,group by 的意图很明显:将数据按区域 ID,客户进行分组。假如我们只需要统计每个区域每个客户的订单总额,那么我们只需要 group by o.region_id,o.cust_nbr 就够了。但我们还想在每一行显示该客户
14、所在区域的订单总额,这一点和前面的不同:需要在前面分组的基础上按区域累加。group by 和 sum 是无法做到这一点的(因为聚集操作的级别不一样,前者是对一个客户,后者是对一批客户 )。这就是 over 函数的作用了!它的作用是告诉 SQL 引擎:按区域对数据进行分区,然后累积每个区域每个客户的订单总额(sum(sum(o.tot_sales)。9最终语句SQL select all_sales.*,2 100 * round(cust_sales / region_sales, 2) | % Percent3 from (select o.cust_nbr customer,4 o.re
15、gion_id region,5 sum(o.tot_sales) cust_sales,6 sum(sum(o.tot_sales) over(partition by o.region_id) region_sales7 from orders_tmp o8 where o.year = 20019 group by o.region_id, o.cust_nbr) all_sales10 where all_sales.cust_sales all_sales.region_sales * 0.2;CUSTOMER REGION CUST_SALES REGION_SALES PERCE
16、NT- - - - -4 5 37802 37802 100%10 6 64315 68065 94%11 7 12204 12204 100%2.分析函数 2(Rank, Dense_rank, row_number)遇到相同的数据时用 Rank, Dense_rank, row_number 排名策略,他们之间的区别 直接看例子SQL select region_id, customer_id, sum(customer_sales) total,2 rank() over(order by sum(customer_sales) desc) rank,3 dense_rank() ove
17、r(order by sum(customer_sales) desc) dense_rank,4 row_number() over(order by sum(customer_sales) desc) row_number5 from user_order6 group by region_id, customer_id;REGION_ID CUSTOMER_ID TOTAL RANK DENSE_RANK ROW_NUMBER- - - - - -8 18 1253840 11 11 115 2 1224992 12 12 129 23 1224992 12 12 139 24 1224
18、992 12 12 1410 30 1216858 15 13 15 103.分析函数 3(Top/Bottom N、First/Last、NTile)一、Top/Bottom N 查询:找出所有订单总额排名前 3 的大客户:SQL select *SQL from (select region_id,SQL customer_id,SQL sum(customer_sales) cust_total,SQL rank() over(order by sum(customer_sales) desc NULLS LAST) rankSQL from user_orderSQL group by
19、 region_id, customer_id)SQL where rank select min(customer_id)2 keep (dense_rank first order by sum(customer_sales) desc) first,3 min(customer_id)4 keep (dense_rank last order by sum(customer_sales) desc) last5 from user_order6 group by customer_id;FIRST LAST- -31 1Min:函数的作用是用于当存在多个 First/Last 情况下保证
20、返回唯一的记录。(去掉会出错)Keep:从上面的结果我们已经知道 Oracle 对排名的结果只“保留”2 条数据,这就是 keep 的作用。告诉 Oracle 只保留符合 keep 条件的记录。dense_rank:是告诉 Oracle 排列的策略,first/last 则告诉最终筛选的条件。11三、按层次查询(NTile)找出订单总额排名前 1/5 的客户。SQL select region_id,2 customer_id,3 ntile(5) over(order by sum(customer_sales) desc) til4 from user_order5 group by re
21、gion_id, customer_id;REGION_ID CUSTOMER_ID TILE- - -10 31 19 25 110 26 16 6 1 8 18 25 2 29 23 36 9 37 11 35 3 46 8 48 16 46 7 510 29 55 1 5Ntil 函数为各个记录在记录集中的排名计算比例,所有的记录分成 5 个等级,假如只需要前 1/5 的记录则只需要截取 TILE 的值为 1 的记录就可以了。假如需要排名前 25%的记录(也就是 1/4)那么我们只需要设置 ntile(4)就可以了。4.窗口函数需要随着遍历记录集的每一条记录的同时进行统计。也即是说:统计
22、不止发生一次,而是发生多次。统计不至发生在记录集形成后,而是发生在记录集形成的过程中。一、窗口函数示例全统计:需求:列出每月的订单总额以及全年的订单总额12rows between . preceding and . following。SQL select month,2 sum(tot_sales) month_sales,3 sum(sum(tot_sales) over (order by month4 rows between unbounded preceding and unbounded following) total_sales5 from orders6 group by
23、 month;MONTH MONTH_SALES TOTAL_SALES- - -1 610697 63077662 428676 63077663 637031 63077664 541146 63077665 592935 63077666 501485 6307766 .(后面的剩了)unbounded preceding and unbouned following:针对当前所有记录的前一条、后一条记录,也就是表中的所有记录。Preceding 前面如果是 1,不是从第 1 条记录开始的意思,而是指当前记录的前一条记录。preceding 前面的修饰符是告诉窗口函数执行时参考的记录数,
24、如同 unbounded 就是告诉 oracle 不管当前记录是第几条,只要前面有多少条记录,都列入统计的范围。二、窗口函数进阶滚动统计(累积/均值):需求:列出每月的订单总额以及截至到当前月的订单总额。也就是说 3 月份要显示当月的订单总额和 1,2,3 月份订单总额的和,依此类推。需要在统计第 N 月的订单总额时,还要再统计这 N 个月来的订单总额之和。curreent rowSQL select month,2 sum(tot_sales) month_sales,3 sum(sum(tot_sales) over(order by month4 rows between unbound
25、ed preceding and current row) current_total_sal13es5 from orders6 group by month;MONTH MONTH_SALES CURRENT_TOTAL_SALES- - -1 610697 6106972 428676 10393733 637031 1676404三、窗口函数进阶根据时间范围统计统计了当天销售额和五天内的评价销售额:select trunc(order_dt) day,sum(sale_price) daily_sales,avg(sum(sale_price) over (order by trunc
26、(order_dt)range between interval 2 day preceding and interval 2 day following) five_day_avgfrom cust_orderwhere sale_price is not null and order_dt between to_date(01-jul-2001,dd-mon-yyyy)and to_date(31-jul-2001,dd-mon-yyyy)为了对指定范围进行统计,Oracle 使用关键字 range、interval 来指定一个范围。上面的例子告诉 Oracle 查找当前日期的前 2 天,
27、后 2 天范围内的记录,并统计其销售平均值。四、窗口函数进阶first_value/last_value:first_value、last_value ,用于在窗口记录集中查找第一条记录和最后一条记录。需要显示当前月、上一个月、后一个月的销售情况,以及每 3 个月的销售平均值select month,first_value(sum(tot_sales) over (order by month rows between 1 preceding and 1 following) prev_month,sum(tot_sales) monthly_sales,last_value(sum(tot_
28、sales) over (order by month rows between 1 preceding and 1 following) next_month,avg(sum(tot_sales) over (order by month 14rows between 1 preceding and 1 following) rolling_avgfrom orderswhere year = 2001 and region_id = 6group by monthorder by month;rows between 1 preceding and 1 following:当前记录的前一条
29、、后一条范围内查找并统计,而 first_value 和 last_value 在这 3 条记录中至分别找出第一条、第三条记录五、窗口函数进阶比较相邻记录:我们想每次显示当月的销售额和上个月的销售额,leg 函数类似于 preceding 和 following 子句,它能够通过和当前记录的相对位置而被应用,在比较同一个相邻的记录集内两条相邻记录的时候特别有用。select month, sum(tot_sales) monthly_sales,lag(sum(tot_sales), 1) over (order by month) prev_month_salesfrom orderswhe
30、re year = 2001and region_id = 6group by monthorder by month;5.报表函数一、报表函数简介sum(sum(tot_sales) over (order by month rows between unbounded preceding and unbounded following)来统计全年的订单总额,这个函数会在记录集形成的过程中,每检索一条记录就执行一次,它总共执行了 12 次。这是非常费时的。实际上我们还有更简便的方法:SQL select month,2 sum(tot_sales) month_sales,3 sum(sum
31、(tot_sales) over(order by month4 rows between unbounded preceding and unbounded following) win_sales,5 sum(sum(tot_sales) over() rpt_sales6 from orders7 group by month;二、RATIO_TO_REPORT 函数:15需求:列出上一年度每个月的销售总额、年底销售额以及每个月的销售额占全年总销售额的比例:方法:select all_sales.*,100 * round(cust_sales / region_sales, 2) |
32、% Percentfrom (select o.cust_nbr customer,o.region_id region,sum(o.tot_sales) cust_sales,sum(sum(o.tot_sales) over(partition by o.region_id) region_salesfrom orders_tmp owhere o.year = 2001group by o.region_id, o.cust_nbr) all_saleswhere all_sales.cust_sales all_sales.region_sales * 0.2;方法:select re
33、gion_id, salesperson_id, sum(tot_sales) sp_sales,round(sum(tot_sales) / sum(sum(tot_sales) over (partition by region_id), 2) percent_of_regionfrom orderswhere year = 2001group by region_id, salesperson_idorder by region_id, salesperson_id;方法select region_id, salesperson_id, sum(tot_sales) sp_sales,round(ratio_to_report(sum(tot_sales) over (partition by region_id), 2) sp_ratiofrom orderswhere year = 2001group by region_id, salesperson_idorder by region_id, salesperson_id;Oracle 提供的 Ratio_to_report 函数允许我们计算每条记录在其对应记录集或其子集中所占的比例。