Keeping the first and last in your result with analytic functions in your query

Today I’m going to show you how to use the KEEP, FIRST and LAST analytic functions in your query.

FIRST and LAST can be used in analytic functions that operate on a set of values that rank as the FIRST or LAST with respect to a given sorting.

When would you need these functions?
When you need a value from the first or last row of a sorted group, but the needed value is not the the same as the sort key.
The FIRST and LAST functions eliminate the need for self-joins or views and thus gives you a better performance.

The KEEP function qualifies an aggregate_function, indicating that only the FIRST or LAST values of aggregate_function will be returned.

(aggregate_function) KEEP (query_partitioning_clause)
e.g.: MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired

Let’s look at this query:

HR>SELECT d.department_name, e.last_name, e.hire_date
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
    ORDER BY department_name, hire_date;

DEPARTMENT LAST_NAME  HIRE_DATE
---------- ---------- ----------
Accounting Higgins    07/06/1994
Accounting Gietz      07/06/1994
Administra Whalen     17/09/1987
Executive  King       17/06/1987
Executive  Kochhar    21/09/1989
Executive  De Haan    13/01/1993
Finance    Faviet     16/08/1994
Finance    Greenberg  17/08/1994
...
Finance    Popp       07/12/1999
Human Reso Mavris     07/06/1994
IT         Hunold     03/01/1990
IT         Ernst      21/05/1991
IT         Austin     25/06/1997
IT         Pataballa  05/02/1998
IT         Lorentz    07/02/1999
Marketing  Hartstein  17/02/1996
Marketing  Fay        17/08/1997
Public Rel Baer       07/06/1994
Purchasing Raphaely   07/12/1994
...
Purchasing Himuro     15/11/1998
Purchasing Colmenares 10/08/1999
Sales      King       30/01/1996
Sales      Sully      04/03/1996
...
Sales      Ande       24/03/2000
Sales      Banda      21/04/2000
Sales      Kumar      21/04/2000
Shipping   Kaufling   01/05/1995
Shipping   Ladwig     14/07/1995
Shipping   Rajs       17/10/1995
...
Shipping   Geoni      03/02/2000
Shipping   Philtanker 06/02/2000
Shipping   Markle     08/03/2000
106 rows selected.

(I have removed rows from the result for readability)

From this result you would like to know for each department the names of the persons that are hired first and last.
You add to the query:
MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired

For each row in your result:
-partition your data per department
-sort it on hiredate
-DENSE_RANK FIRST will aggregate over only those rows with the minimum (FIRST)dense rank
-KEEP will use the MIN(hire_date) function only over these results and not the complete results of your query

HR>SELECT d.department_name, e.last_name, e.hire_date
          ,MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired
          ,MAX(hire_date) KEEP ( DENSE_RANK LAST  ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS LastHired
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
    ORDER BY department_name, hire_date;

DEPARTMENT LAST_NAME  HIRE_DATE  FIRSTHIRED LASTHIRED
---------- ---------- ---------- ---------- ----------
Accounting Gietz      07/06/1994 07/06/1994 07/06/1994
Accounting Higgins    07/06/1994 07/06/1994 07/06/1994
Administra Whalen     17/09/1987 17/09/1987 17/09/1987
Executive  King       17/06/1987 17/06/1987 13/01/1993
Executive  Kochhar    21/09/1989 17/06/1987 13/01/1993
Executive  De Haan    13/01/1993 17/06/1987 13/01/1993
Finance    Faviet     16/08/1994 16/08/1994 07/12/1999
Finance    Greenberg  17/08/1994 16/08/1994 07/12/1999
...
Finance    Popp       07/12/1999 16/08/1994 07/12/1999
Human Reso Mavris     07/06/1994 07/06/1994 07/06/1994
IT         Hunold     03/01/1990 03/01/1990 07/02/1999
IT         Ernst      21/05/1991 03/01/1990 07/02/1999
IT         Austin     25/06/1997 03/01/1990 07/02/1999
IT         Pataballa  05/02/1998 03/01/1990 07/02/1999
IT         Lorentz    07/02/1999 03/01/1990 07/02/1999
Marketing  Hartstein  17/02/1996 17/02/1996 17/08/1997
Marketing  Fay        17/08/1997 17/02/1996 17/08/1997
Public Rel Baer       07/06/1994 07/06/1994 07/06/1994
Purchasing Raphaely   07/12/1994 07/12/1994 10/08/1999
...
Purchasing Himuro     15/11/1998 07/12/1994 10/08/1999
Purchasing Colmenares 10/08/1999 07/12/1994 10/08/1999
Sales      King       30/01/1996 30/01/1996 21/04/2000
Sales      Sully      04/03/1996 30/01/1996 21/04/2000
...
Sales      Ande       24/03/2000 30/01/1996 21/04/2000
Sales      Banda      21/04/2000 30/01/1996 21/04/2000
Sales      Kumar      21/04/2000 30/01/1996 21/04/2000
Shipping   Kaufling   01/05/1995 01/05/1995 08/03/2000
Shipping   Ladwig     14/07/1995 01/05/1995 08/03/2000
...
Shipping   Geoni      03/02/2000 01/05/1995 08/03/2000
Shipping   Philtanker 06/02/2000 01/05/1995 08/03/2000
Shipping   Markle     08/03/2000 01/05/1995 08/03/2000
106 rows selected.

(I have removed rows from the result for readability)

Again we have 106 rows and we can see the first and last hiredate for each department and how this relates to to the hire_date for the individual employee.
Now we will filter only the result we want from the previous query:

HR>SELECT department_name, last_name, hire_date
     FROM (
       SELECT d.department_name, e.last_name, e.salary,e.hire_date
              ,MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired
              ,MAX(hire_date) KEEP ( DENSE_RANK LAST  ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS LastHired
         FROM employees e, departments d
        WHERE e.department_id = d.department_id
         )
   WHERE (hire_date = FirstHired OR hire_date = LastHired)
   ORDER BY department_name, hire_date;

DEPARTMENT LAST_NAME  HIRE_DATE
---------- ---------- ----------
Accounting Higgins    07/06/1994
Accounting Gietz      07/06/1994
Administra Whalen     17/09/1987
Executive  King       17/06/1987
Executive  De Haan    13/01/1993
Finance    Faviet     16/08/1994
Finance    Popp       07/12/1999
Human Reso Mavris     07/06/1994
IT         Hunold     03/01/1990
IT         Lorentz    07/02/1999
Marketing  Hartstein  17/02/1996
Marketing  Fay        17/08/1997
Public Rel Baer       07/06/1994
Purchasing Raphaely   07/12/1994
Purchasing Colmenares 10/08/1999
Sales      King       30/01/1996
Sales      Banda      21/04/2000
Sales      Kumar      21/04/2000
Shipping   Kaufling   01/05/1995
Shipping   Markle     08/03/2000
20 rows selected.

Related posts

Tags: , ,

2 Responses to “Keeping the first and last in your result with analytic functions in your query”

  1. skulikou says:

    agree with Phil

  2. Phil says:

    I don’t think that the example used actually demonstrates the feature at all, in that I think it would give exactly the same results without the KEEP clause. The analytic function it shows is…
    MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired
    but I think the following would always yield the same…
    MIN(hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired

Leave a Reply