当前位置:Gxlcms > 数据库问题 > Analytic Functions in Oracle

Analytic Functions in Oracle

时间:2021-07-01 10:21:17 帮助过:10人阅读

Overview and Introduction
How Analytic Functions Work
The Syntax
Examples
Calculate a running Total
Top-N Queries
    Example 1
    Example 2
Windows
    Range Windows
    Compute average salary for defined range
    Row Windows
    Accessing Rows Around Your Current Row
LAG
LEAD
Determine the First Value / Last Value of a Group
Crosstab or Pivot Queries
ROLLUP and RANK Examples
CUBE
Grouping Functions:
    Grouping_ID
    GROUP_ID
    Grouping SETS
More Examples with EMP Table

Overview

Analytic Functions are designed to address such problems as "Calculate a running total", "Find percentages within a group", "Top-N queries", "Compute a moving average" and many more. 
Most of these problems can be solved using standard PL/SQL, however the performance is often not what it should be. 
Analytic Functions add extensions to the SQL language that not only make these operations easier to code; they make them faster than could be achieved with pure SQL or PL/SQL.

How Analytic Functions Work ?

Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions (like select AVG(sal) from emp) in that they return multiple rows for each group.
Analytic functions also operate on subsets of rows, similar to aggregate functions in GROUP BY queries, but they do not reduce the number of rows returned by the query.
The group of rows is called a window and is defined by the analytic clause. For each row, a "sliding" window of rows is defined. The window determines the range of rows used to perform the calculations for the "current row". Window sizes can be based on either a physical number of rows or a logical interval such as time.
Select MAX() OVER ()
The OVER() statement signals a start of an Analytic function. That is what differentiates an Analytical Function from a regular Oracle SQL function

Select MAX() OVER(partition by field1).
The portioning clause is used to setup the group of data that the Analytic function would be applied to.

Select MAX() OVER(Partition by field order by)
Order by specify the order of the window in the group by statement. The Order by clause is a keyword in the Oracle Analytic syntax that is requirement for using some Analytic functions
Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

Example:
SELECT empno, deptno, sal,
       AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
FROM   emp;
    EMPNO     DEPTNO        SAL AVG_DEPT_SAL
--------- ---------- ---------- ------------
     7782         10       2450   2916.66667
     7839         10       5000   2916.66667
     7934         10       1300   2916.66667
     7566         20       2975         2175
     7902         20       3000         2175
     7876         20       1100         2175
     7369         20        800         2175
     7788         20       3000         2175
     7521         30       1250   1566.66667
     7844         30       1500   1566.66667
     7499         30       1600   1566.66667
     7900         30        950   1566.66667
     7698         30       2850   1566.66667
     7654         30       1250   1566.66667
This time AVG is an analytic function, operating on the group of rows defined by the contents of the OVER clause. This group of rows is known as a window, which is why analytic functions are sometimes referred to as window[ing] functions. Notice how the AVG function is still reporting the departmental average, like it did in the GROUP BY query, but the result is present in each row, rather than reducing the total number of rows returned. This is because analytic functions are performed on a result set after all join, WHERE, GROUP BY and HAVING clauses are complete, but before the final ORDER BY operation is performed.

The Syntax

There are some variations in the syntax of the individual analytic functions, but the basic syntax for an analytic function is as follows.

analytic_function([ arguments ]) OVER (analytic_clause)

The analytic_clause breaks down into the following optional elements.

[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]

The sub-elements of the analytic_clause each have their own syntax diagrams. Rather than repeat the syntax diagrams, the following sections describe what each section of the analytic_clause is used for.

So here is the FULL sentence:

Analytic-Function(<Argument>,<Argument>,...)
OVER (
  <Query-Partition-Clause>
  <Order-By-Clause>
  <Windowing-Clause>
)

  • Analytic-Functions
    Specify the name of an analytic function, Oracle actually provides many analytic functions such as AVGCORRCOVAR_POPCOVAR_SAMPCOUNTCUME_DIST,DENSE_RANKFIRSTFIRST_VALUELAGLASTLAST_VALUELEADMAXMINNTILEPERCENT_RANKPERCENTILE_CONTPERCENTILE_DISCRANKRATIO_TO_REPORT,STDDEVSTDDEV_POPSTDDEV_SAMPSUMVAR_POPVAR_SAMPVARIANCE.

  • Arguments
    Analytic functions take 0 to 3 arguments.

  • Query-Partition-Clause 
    The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words "partition" and "group" are used synonymously here. The analytic functions are applied to each group independently, they are reset for each group. If the query_partition_clause is omitted, the whole result set is treated as a single partition. 
    Example: The following query uses an empty OVER clause, so the average presented is based on all the rows of the result set.

    SELECT empno, deptno, sal,
           AVG(sal) OVER () AS avg_sal
    FROM   emp;

         EMPNO     DEPTNO        SAL    AVG_SAL
    ---------- ---------- ---------- ----------
          7369         20        800 2073.21429
          7499         30       1600 2073.21429
          7521         30       1250 2073.21429
          7566         20       2975 2073.21429
          7654         30       1250 2073.21429
          7698         30       2850 2073.21429
          7782         10       2450 2073.21429
          7788         20       3000 2073.21429
          7839         10       5000 2073.21429
          7844         30       1500 2073.21429
          7876         20       1100 2073.21429
          7900         30        950 2073.21429
          7902         20       3000 2073.21429
          7934         10       1300 2073.21429

    If we change the OVER clause to include a query_partition_clause based on the department, the averages presented are specifically for the department the employee belongs too.

    SELECT empno, deptno, sal,
           AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
    FROM   emp;

        EMPNO     DEPTNO        SAL AVG_DEPT_SAL
    --------- ---------- ---------- ------------
         7782         10       2450   2916.66667
         7839         10       5000   2916.66667
         7934         10       1300   2916.66667
         7566         20       2975         2175
         7902         20       3000         2175
         7876         20       1100         2175
         7369         20        800         2175
         7788         20       3000         2175
         7521         30       1250   1566.66667
         7844         30       1500   1566.66667
         7499         30       1600   1566.66667
         7900         30        950   1566.66667
         7698         30       2850   1566.66667
         7654         30       1250   1566.66667

  • Order-By-Clause
    The order_by_clause is used to order rows, or siblings, within a partition. So if an analytic function is sensitive to the order of the siblings in a partition you should include an order_by_clause. The following query uses the FIRST_VALUE function to return the first salary reported in each department. Notice we have partitioned the result set by the department, but there is no order_by_clause.

    SELECT empno, deptno, sal, 
           FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno) AS first_sal_in_dept
    FROM   emp;

         EMPNO     DEPTNO        SAL FIRST_SAL_IN_DEPT
    ---------- ---------- ---------- -----------------
          7782         10       2450              2450
          7839         10       5000              2450
          7934         10       1300              2450
          7566         20       2975              2975
          7902         20       3000              2975
          7876         20       1100              2975
          7369         20        800              2975
          7788         20       3000              2975
          7521         30       1250              1250
          7844         30       1500              1250
          7499         30       1600              1250
          7900         30        950              1250
          7698         30       2850              1250
          7654         30       1250              1250

    Now compare the values of the FIRST_SAL_IN_DEPT column when we include an order_by_clause to order the siblings by ascending salary.

    SELECT empno, deptno, sal, 
           FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY sal ASC NULLS LAST) AS first_val_in_dept
    FROM   emp;

         EMPNO     DEPTNO        SAL FIRST_VAL_IN_DEPT
    ---------- ---------- ---------- -----------------
          7934         10       1300              1300
          7782         10       2450              1300
          7839         10       5000              1300
          7369         20        800               800
          7876         20       1100               800
          7566         20       2975               800
          7788         20       3000               800
          7902         20       3000               800
          7900         30        950               950
          7654         30       1250               950
          7521         30       1250               950
          7844         30       1500               950
          7499         30       1600               950
          7698         30       2850               950

    In this case the "ASC NULLS LAST" keywords are unnecessary as ASC is the default for an order_by_clause and NULLS LAST is the default for ASC orders. When ordering by DESC, the default is NULLS FIRST.

    It is important to understand how the order_by_clause affects display order. The order_by_clause is guaranteed to affect the order of the rows as they are processed by the analytic function, but it may not always affect the display order. As a result, you must always use a conventional ORDER BY clause in the query if display order is important. Do not rely on any implicit ordering done by the analytic function. Remember, the conventional ORDER BY clause is performed after the analytic processing, so it will always take precedence.

  •  Windowing-Clause
    The windowing_clause gives some analytic functions a further degree of control over this window within the current partition. The windowing_clause is an extension of the order_by_clause and as such, it can only be used if an order_by_clause is present. The windowing_clause has two basic forms.
    RANGE BETWEEN start_point AND end_point
    ROWS BETWEEN start_point AND end_point

    Possible values for "start_point" and "end_point" are:

    • UNBOUNDED PRECEDING : The window starts at the first row of the partition. Only available for start points.
    • UNBOUNDED FOLLOWING : The window ends at the last row of the partition. ONly available for end points.
    • CURRENT ROW : The window starts or ends at the current row. Can be used as start or end point.
    • value_expr PRECEDING : An physical or logical offset before the current row using a constant or expression that evaluates to a positive numerical value. When used with RANGE, it can also be an interval literal if the order_by_clause uses a DATE column.
    • value_expr FOLLOWING : As above, but an offset after the current row.

    For analytic functions that support the windowing_clause, the default action is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". The following query is similar to one used previously to report the employee salary and average department salary, but now we have included an order_by_clause so we also get the default windowing_clause. Notice how the average salary is now calculated using only the employees from the same department up to and including the current row.

    SELECT empno, deptno, sal, 
           AVG(sal) OVER (PARTITION BY deptno ORDER BY sal) AS avg_dept_sal_sofar
    FROM   emp;

       EMPNO     DEPTNO        SAL AVG_DEPT_SAL_SOFAR
    -------- ---------- ---------- ------------------
        7934         10       1300               1300
        7782         10       2450               1875
        7839         10       5000         2916.66667
        7369         20        800                800
        7876         20       1100                950
        7566         20       2975               1625
        7788         20       3000               2175
        7902         20       3000               2175
        7900         30        950                950
        7654         30       1250               1150
        7521         30       1250               1150
        7844         30       1500             1237.5
        7499         30       1600               1310
        7698         30       2850         1566.66667

    The following query shows one method for accessing data from previous and following rows within the current row using the windowing_clause. This can also be accomplished with LAG and LEAD.
    SELECT empno, deptno, sal, 
           FIRST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sal,
           LAST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_sal
    FROM   emp;
        EMPNO     DEPTNO        SAL PREVIOUS_SAL   NEXT_SAL
    --------- ---------- ---------- ------------ ----------
         7369         20        800          800        950
         7900         30        950          800       1100
         7876         20       1100          950       1250
         7521         30       1250         1100       1250
         7654         30       1250         1250       1300
         7934         10       1300         1250       1500
         7844         30       1500         1300       1600
         7499         30       1600         1500       2450
         7782         10       2450         1600       2850
         7698         30       2850         2450       2975
         7566         20       2975         2850       3000
         7788         20       3000         2975       3000
         7902         20       3000         3000       5000
         7839         10       5000         3000       5000

    More information on windows can be found here.


Analytic Function Examples
Simple Example
Select * from ( Select cust_name, sum(clm_amt)clm_amt
                  from customer
                  group by cust_name
                  order by clm_amt desc desc ) v
              )
    and rownum < 11;

Simple Example result
CUST_NAME                      CLM_AMT
------------------------------ -----------
XYZ                            100,000,000
Lexus Corp                      80,000,000
First America                   60,000,000
Yelp                            78,000,000
ABC                             75,000,000
Omega Int.                      74,000,000
S Corp                          70,000,000
Acme                            25,000,000
Sun Enterprise                  23,000,000
Film studio                     17,000,000

Analytic Version
select cust_name, SUM(clm_amt) OVER (partition by cust_name) clm_amt;

In the following example we‘ll show GROUPING SETS (Listing 1), GROUP BY ROLLUP (Listing 2), and GROUP BY CUBE (Listing 3) to see what we get with each. We‘ll use the standard SCOTT.EMP table to do this. The first query will show us the sum of salaries by DEPTNO and by JOB. We need to use the GROUPING_ID function to determine what aggregation each row represents. It might not be obvious why we would need this in general from the example, but consider what would happen if DEPTNO or JOB were NULLABLE. There would be no way to distinguish the detail row from the aggregated row.

The GROUPING_ID function returns a 0 or 1 when given a single column. (In this case, it works just like the GROUPING function.) If the return value is 0, indicating a detail record value for that particular column, then the column in question was not aggregated over (was not collapsed). If the function returns 1, then the column in question was aggregated over—any aggregates in the SELECT list would have been computed over that entire column‘s set of values. GROUPING_ID differs from GROUPING, in that you can send a list of columns and the GROUPING_ID function will treat the list as bits and return a decimal number. That means that the call to GROUPING_ID(a,b,c)might return any number between 0 and 7, because different 0/1 combinations are returned. Given that fact, we can use a CASE statement in the query to see if the row is a detail row for DEPTNO, for JOB, for neither, or for both.

Using GROUPING SETS in Listing 1, we asked for GROUP BY only on DEPTNO and then only on JOB. So, that one query was like running the following query

 

select deptno, null, sum(sal) 
from emp group by deptno
union all
select null, job, sum(sal)
from emp group by job;

Code Listing 1: Using GROUPING_ID with GROUPING SETS

 

select deptno, job, sum(sal),
grouping_id(deptno) gid_d,
grouping_id(job) gid_j,
grouping_id(deptno,job) gid_dj,
bin_to_num(grouping_id(deptno),grouping_id(job)) b2n,
case when grouping_id(deptno,job) = 0
then ‘Dtl both‘
when grouping_id(deptno,job) = 1
then ‘Agg over job‘
when grouping_id(deptno,job) = 2
then ‘Agg over deptno‘
when grouping_id(deptno,job) = 3
then ‘Agg over both‘
end what
from emp
group by grouping sets( (deptno), (job) );

DEPTNO JOB SUM(SAL) GID_D GID_J GID_DJ B2N WHAT
________ ____________ ___________ _______ ________ ________ ____ __________________
10 8750 0 1 1 1 Agg over job
20 0875 0 1 1 1 Agg over job
30 9400 0 1 1 1 Agg over job
ANALYST 6000 1 0 2 2 Agg over deptno
CLERK 4150 1 0 2 2 Agg over deptno
MANAGER 8275 1 0 2 2 Agg over deptno
PRESIDENT 5000 1 0 2 2 Agg over deptno
SALESMAN 5600 1 0 2 2 Agg over deptno

. . . but without having to make two passes on the EMP table, as would be the case with the UNION ALL.

In looking at the columns involved in the query in Listing 1, we can see that the function GROUPING(column_name) shows us when a column is aggregated over or preserved as a detail record. When GROUPING(deptno) = 0, DEPTNO is preserved in the output. When it is 1, it is aggregated over. However, we have two columns in this set we are aggregating by, for a total of four possible 0/1 combinations. (In this query, only two are possible.) Using the GROUPING_ID function on this vector of columns, we can easily see what each row represents. I‘ve also included the alternative, more verbose way to accomplish this—the BIN_TO_NUM() function, to which we can send a list of 0s and 1s and get back a decimal number as well. I‘m pretty sure you‘ll agree that GROUPING_ID(c1,c2,c3) is easier than the corresponding BIN_TO_NUM call with three GROUPING calls.

In Listing 2, we take a look at GROUP BY ROLLUP. A rollup by the two columns DEPTNO and JOB will produce

 

  1. Detail records by DEPTNO, JOB (sum of SAL for each DEPTNO/JOB combination).
  2. A summary record for each DEPTNO over JOB (like a subtotal).
  3. A summary record over DEPTNO and JOB—a single aggregate for the entire result. Listing 2 shows the query and the results.

Code Listing 2: Using GROUPING_ID with GROUP BY ROLLUP

 

select deptno, job, sum(sal),
grouping_id(deptno) gid_d,
grouping_id(job) gid_j,
case when grouping_id(deptno,job) = 0
then ‘Dtl both‘
when grouping_id(deptno,job) = 1
then ‘Agg over job‘
when grouping_id(deptno,job) = 2
then ‘Agg over deptno‘
when grouping_id(deptno,job) = 3
then ‘Agg over both‘
end what
from emp
group by rollup( deptno, job );

DEPTNO JOB SUM(SAL) GID_D GID_J WHAT
________ _____________ ___________ _______ _______ ________________
10 CLERK 1300 0 0 Dtl both
10 MANAGER 2450 0 0 Dtl both
10 PRESIDENT 5000 0 0 Dtl both
10 8750 0 1 Agg over job
20 CLERK 1900 0 0 Dtl both
20 ANALYST 6000 0 0 Dtl both
20 MANAGER 2975 0 0 Dtl both
20 10875 0 1 Agg over job
30 CLERK 950 0 0 Dtl both
30 MANAGER 2850 0 0 Dtl both
30 SALESMAN 5600 0 0 Dtl both
30 9400 0 1 Agg over job
29025 1 1 Agg over both

A rollup is sort of like a running total report, and GROUPING_ID tells us when the rollups happened. So the data is sorted by DEPTNO, JOB, and we have subtotals by DEPTNO (aggregated over JOB) and by DEPTNO, JOB (aggregated over both) along with the details by DEPTNO/JOB.

As you can see in Listing 2, the GROUPING_ID function was useful in telling us when we were dealing with a rolled-up record and the level of detail we could expect in that record.

Last, we‘ll look at GROUP BY CUBE. CUBE is similar to ROLLUP, in that you get the same three record types as shown in Listing 2 but also get all possible aggregations. CUBE grouping by DEPTNO and JOB will give you records by all of the following:

 

  1. DEPTNO and JOB
  2. DEPTNO over JOB
  3. JOB over DEPTNO
  4. A single total aggregate

You get every possible aggregate. Listing 3 shows the syntax and output and how to use the GROUPING_ID function to see what the level of detail is for each row. It is interesting to note that GROUP BY CUBE produces a superset of the rows we observed in the first query (in Listing 1). You could use GROUPING_ID with CUBE to generate the same result set as the original grouping sets query. That is, adding

 

having (grouping_id(deptno,job)=2  
or (grouping_id(deptno,job)=1

. . . to the GROUP BY CUBE query would cause it to be the logical equivalent of the GROUPING SETS query. But you shouldn‘t do that! If you need only some of the aggregates, use GROUPING SETS to get just the ones you need computed and avoid computing the others altogether. It would be fair to say that GROUPING_ID doesn‘t avoid multiple grouping functions but GROUPING SETS does. However, GROUPING_ID plays an important role in seeing what data is what.

Code Listing 3: Using GROUPING_ID with GROUP BY CUBE

 

select deptno, job, sum(sal),
grouping_id(deptno) gid_d,
grouping_id(job) gid_j,
case when grouping_id(deptno,job) = 0
then ‘Dtl both‘
when grouping_id(deptno,job) = 1
then ‘Agg over job‘
when grouping_id(deptno,job) = 2
then ‘Agg over deptno‘
when grouping_id(deptno,job) = 3
then ‘Agg over both‘
end what
from emp
group by cube( deptno, job )
order by grouping_id(deptno,job) ;


DEPTNO JOB SUM(SAL) GID_D GID_J WHAT
_________ _____________ ___________ _______ _______ ___________________
10 CLERK 1300 0 0 Dtl both
10 MANAGER 2450 0 0 Dtl both
10 PRESIDENT 5000 0 0 Dtl both
20 CLERK 1900 0 0 Dtl both
30 CLERK 950 0 0 Dtl both
30 SALESMAN 5600 0 0 Dtl both
30 MANAGER 2850 0 0 Dtl both
20 MANAGER 2975 0 0 Dtl both
20 ANALYST 6000 0 0 Dtl both
10 8750 0 1 Agg over job
20 10875 0 1 Agg over job
30 9400 0 1 Agg over job
CLERK 4150 1 0 Agg over deptno
ANALYST 6000 1 0 Agg over deptno
MANAGER 8275 1 0 Agg over deptno
PRESIDENT 5000 1 0 Agg over deptno
SALESMAN 5600 1 0 Agg over deptno
29025 1 1 Agg over both

 

 

Example: Calculate a running Total

This example shows the cumulative salary within a departement row by row, with each row including a summation of the prior rows salary.

set autotrace traceonly explain
break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999
column seq format 999

SELECT ename "Ename", deptno "Deptno", sal "Sal",
  SUM(sal) OVER (ORDER BY deptno, ename) "Running Total",
  SUM(SAL) OVER (PARTITION BY deptno
                 ORDER BY ename) "Dept Total",
  ROW_NUMBER() 
    OVER (PARTITION BY deptno ORDER BY ENAME) "Seq"
FROM emp
ORDER BY deptno, ename
/

Ename  Deptno    Sal Running Total Dept Total  Seq
------ ------ ------ ------------- ---------- ----
CLARK      10   2450          2450       2450    1
KING            5000          7450       7450    2
MILLER          1300          8750       8750    3

ADAMS      20   1100          9850       1100    1
FORD            3000         12850       4100    2
JONES           2975         15825       7075    3
SCOTT           3000         18825      10075    4
SMITH            800         19625      10875    5

ALLEN      30   1600         21225       1600    1
BLAKE           2850         24075       4450    2
JAMES            950         25025       5400    3
MARTIN          1250         26275       6650    4
TURNER          1500         27775       8150    5
WARD            1250         29025       9400    6

Execution Plan
---------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 WINDOW (SORT)
2 1 TABLE ACCESS (FULL) OF ‘EMP‘
Statistics
---------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
1658 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed

The example shows how to calculate a "Running Total" for the entire query. This is done using the entire ordered result set, via SUM(sal) OVER (ORDER BY deptno, ename).
Further, we were able to compute a running total within each department, a total that would be reset at the beginning of the next department. The PARTITION BY deptno in that SUM(sal) caused this to happen, a partitioning clause was specified in the query in order to break the data up into groups.
The ROW_NUMBER() function is used to sequentially number the rows returned in each group, according to our ordering criteria (a "Seq" column was added to in order to display this position).
The execution plan shows, that the whole query is very well performed with only 3 consistent gets, this can never be accomplished with standard SQL or even PL/SQL.

Top-N Queries

How can we get the Top-N records by some set of fields ?
Prior to having access to these analytic functions, questions of this nature were extremely difficult to answer. There are some problems with Top-N queries however; mostly in the way people phrase them. It is something to be careful about when designing reports. Consider this seemingly sensible request:
I would like the top three paid sales reps by department

Using the "traditional approach you can perform:
select *  from 
(your_query)
 where rownum <= 10;

The problem with this question is that it is ambiguous because of repeated values, there might be four people who all make the same salary, what should we do then ?
Let‘s look at three examples, all use the well known table EMP.

Example 1

Let‘s look at what ROW_NUMBER can do. Here is an example query using ROW_NUMBER to assign an increasing number to each row in the EMP table after sorting by SAL DESC:

 

select ename, sal,
row_number() over (order by sal desc) rn
from emp
order by sal desc;

ENAME SAL RN
----- ---- --
KING 5000 1
FORD 3000 2
SCOTT 3000 3
JONES 2975 4
.
.
.
JAMES 950 13
SMITH 800 14

I can apply a predicate to ROW_NUMBER after it is assigned. For example

 

select * from (select ename, sal,
row_number() over (order by sal desc) rn
from emp)
where rn <= 3
order by sal desc;

ENAME SAL RN
----- ---- --
KING 5000 1
SCOTT 3000 2
FORD 3000 3

So, that demonstrates how to perform a top-n query by using ROW_NUMBER and also points out a general issue with top-n queries. If you look at that result, you see two rows with the value 3000. What if, in the EMP table, three people, instead of just two, had a salary of 3000? The result obtained by the above query would be ambiguous—I would get three records, but the records I retrieved would be somewhat random. We will analyze that on the example 2:

Another Example:
Sort the sales people by salary from greatest to least. Give the first three rows. If there are less then three people in a department, this will return less than three records.

set autotrace on explain
break on deptno skip 1

SELECT * FROM (SELECT deptno, ename, sal, ROW_NUMBER()
                 OVER (PARTITION BY deptno 
                          ORDER BY sal DESC)
                 Top3 FROM emp)
  WHERE Top3 <= 3
;

    DEPTNO ENAME             SAL       TOP3
---------- ---------- ---------- ----------
        10 KING             5000          1
           CLARK            2450          2
           MILLER           1300          3

        20 SCOTT            3000          1
           FORD             3000          2
           JONES            2975          3

        30 BLAKE            2850          1
           ALLEN            1600          2
           TURNER           1500          3

This query works by sorting each partition (or group, which is the deptno), in a descending order, based on the salary column and then assigning a sequential row number to each row in the group as it is processed. The use of a WHERE clause after doing this to get just the first three rows in each partition. 

 

Example 2

Bearing this in mind, I can use other analytic functions to remove the ambiguity from example 1. They will do so, but the analytic functions might return more than n rows. In my opinion, when the attribute I order by is not unique, I want my query to return all of the relevant records—not just the first narbitrary ones. To that end, I can use the RANK and DENSE_RANK analytic functions. Let‘s take a look at what they do:

 

select ename,sal,
row_number() over (order by sal desc)rn,
rank() over (order by sal desc)rnk,
dense_rank() over (order by sal desc)drnk
from emp
order by sal desc;

ENAME SAL RN RNK DRNK
----- ---- -- --- ----
KING 5000 1 1 1
FORD 3000 2 2 2
SCOTT 3000 3 2 2
JONES 2975 4 4 3
BLAKE 2850 5 5 4
CLARK 2450 6 6 5
.
.
.

The main things to note here are the following:

 

  • ROW_NUMBER assigns contiguous, unique numbers from 1..N to a result set.

 

RANK does not assign unique numbers—FORD and SCOTT tied for second place—nor does it assign contiguous numbers. No record was assigned the value of 3, because two people tied for second place, and no one came in third, according to RANK.

 

DENSE_RANK, like RANK, does not assign unique numbers, but it does assign contiguous numbers. Even though two records tied for second place, there is a third-place record.

You can use RANK and DENSE_RANK in the same way you would use ROW_NUMBER to restrict the number of rows returned, but obviously you‘ll get subtly different results. For example

 

select * from (select ename,sal,
dense_rank() over (order by sal desc)drnk
from emp)
where drnk <= 3
order by sal desc;

ENAME SAL DRNK
----- ---- ----
KING 5000 1
SCOTT 3000 2
FORD 3000 2
JONES 2975 3

That query returns "the set of people who make the top three salaries," which is likely the desired result. Getting the first three records from EMP after sorting by SAL is rather arbitrary, because using exactly the same set of data, simply inserted in different orders, you could observe different result sets with ROW_NUMBER (because SAL is not unique). Using DENSE_RANK, however, I don‘t get precisely three records but, instead, a repeatable (deterministic) result set. And I suspect that I retrieve the set the end user really meant to retrieve—the set of people making the top three salaries.


Another Example:
Give me the set of sales people who make the top 3 salaries - that is, find the set of distinct salary amounts, sort them, take the largest three, and give me everyone who makes one of those values.

SELECT * FROM (SELECT deptno, ename, sal,
                 DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal desc) 
                 TopN FROM emp)
   WHERE TopN <= 3
   ORDER BY deptno, sal DESC;


    DEPTNO ENAME             SAL       TOPN
---------- ---------- ---------- ----------
        10 KING             5000          1
           CLARK            2450          2
           MILLER           1300          3

        20 SCOTT            3000          1  <--- !
           FORD             3000          1  <--- !

           JONES            2975          2
           ADAMS            1100          3

        30 BLAKE            2850          1
           ALLEN            1600          2
        30 TURNER           1500          3

Here the DENSE_RANK function was used to get the top three salaries. We assigned the dense rank to the salary column and sorted it in a descending order.
The DENSE_RANK function computes the rank of a row in an ordered group of rows. The ranks are consecutive integers beginning with 1. The largest rank value is the number of unique values returned by the query. Rank values are not skipped in the event of ties. Rows with equal values for the ranking criteria receive the same rank.
The DENSE_RANK function does not skip numbers and will assign the same number to those rows with the same value. Hence, after the result set is built in the inline view, we can simply select all of the rows with a dense rank of three or less, this gives us everyone who makes the top three salaries by department number.

Windows

The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. The default window is an anchored window that simply starts at the first row of a group an continues to the current row.
We can set up windows based on two criteria: RANGES of data values or ROWS offset from the current row. It can be said, that the existance of an ORDER BY in an analytic function will add a default window clause of RANGE UNBOUNDED PRECEDING. That says to get all rows in our partition that came before us as specified by the ORDER BY clause.
Let‘s look at an example with a sliding window within a group and compute the sum of the current row‘s SAL column plus the previous 2 rows in that group. If we need a report that shows the sum of the current employee‘s salary with the preceding two salaries within a departement, it would look like this.

break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999

SELECT deptno "Deptno", ename "Ename", sal "Sal",
  SUM(SAL) OVER (PARTITION BY deptno
                 ORDER BY ename
                 ROWS 2 PRECEDING) "Sliding Total"
FROM emp
ORDER BY deptno, ename;

Deptno Ename     Sal Sliding Total
------ ------ ------ -------------
    10 CLARK    2450          2450
       KING     5000          7450
       MILLER   1300          8750

    20 ADAMS    1100          1100
       FORD     3000          4100
       JONES    2975          7075  ^
       SCOTT    3000          8975  |
       SMITH     800          6775  \-- Sliding Window


    30 ALLEN    1600          1600
       BLAKE    2850          4450
       JAMES     950          5400
      

人气教程排行