Archive for the ‘Oracle’ Category

Set user variable in SQL*Plus based on current user

Friday, October 8th, 2010

Sometimes you want to set user variables in a SQL*Plus script that depends on the environment you are working on.

For example; if you are deploying scripts in a development environment and the same scripts on a production environment you would like to set some variables that are used in stored procedures or packages, installation scripts etc.

Instead of prompting for these variables it is easier to set the variables in a script.

This script will do the following:

  • first find the name of the current user
  • then in PL/SQL, set different values based on the current user
  • then convert the bind variable to a user variable
  • and define your user variable in the SQL*Plus environment.
HR>

VARIABLE vAppId VARCHAR2(50);
DECLARE
    vUser VARCHAR2(50);
BEGIN

    SELECT USER INTO vUser FROM DUAL;

    IF vUser = 'SCOTT' THEN
        :vAppId := '1234';
    ELSIF vUser = 'HR'  THEN
        :vAppId := '5678';
    ELSE
        :vAppId := '';
        DBMS_OUTPUT.PUT_LINE('No applicationId is set');
    END IF;
END;
/
COLUMN vappidcol new_value APPLICATION_ID noprint
SELECT :vAppId vappidcol FROM DUAL;

HR>

Now you can view your new set variable:

HR>define

DEFINE _DATE           = "08/10/2010 19:24:52" (CHAR)
DEFINE _USER           = "HR" (CHAR)
DEFINE _SQLPLUS_RELEASE = "1002000100" (CHAR)
DEFINE _EDITOR           = "vim" (CHAR)
DEFINE _O_RELEASE      = "1002000400" (CHAR)
DEFINE APPLICATION_ID  = "5678" (CHAR)

HR>

How to control changes in your database

Tuesday, June 15th, 2010

Database schema’s need version control as much as application development. Database schema’s: tables, triggers, columns, data types, packages, functions, stored procedures, views, and so on, are not static and will change to accommodate the varying needs of your organization

If you work in a development team or a DBA team it is vital that nobody make ad-hoc changes and that all changes are documented. This boils down to one simple rule: “do NOT apply any scripts  in your DB  if they are not from source control.”
You can’t really enforce  this rule to people, it simply comes down to discipline of all team members.

Here are some guidelines help you control your changes:

  1. Create a baseline schema. This is your starting point for changes to your database.
  2. Use a  single authoritative source for your schema and code (read: source control). Your baseline schema will be revision “0″.
  3. Do not use a shared database server for development work. All members should run their own development database. This forces the team to use all code from source control. Developers cannot overwrite each others changes and development will be faster.
  4. For all changes other then a development environment; use a log for all changes! This can be a log file or a table in the database.
  5. Use atomic change scripts for all changes in your database. Use a date format for the filename. e.g.: change_2010_05_28-001.sql. Of course these scripts are in version control.
  6. All change scripts have an undo variant that will undo the change. undo_2010_05_28-001.sql.
  7. For all DDL changes other then in a development environment; Log DDL changes with a trigger and store these events in a table.
  8. Unit test your stored code! All changes made in your packages, functions or stored procedures are checked for validity and correctness.

How do you control your changes in your schema’s?

Looking back with the analytic function LAG

Tuesday, June 8th, 2010

In this article I will show you how to use the LAG analytic function to look back in your SQL results.

LAG gives you access to another row at a given physical offset prior to that position.
Think about this for a while… you do not need a self join!
On large result sets, this is a major performance compared to using self joins!

Let’s have a look at the salaries of the IT department…

HR>SELECT e.last_name, e.salary
  FROM employees e, departments d
 WHERE department_name = 'IT'
   AND e.department_id = d.department_id
 ORDER BY salary;

LAST_NAME   SALARY
---------- -------
Lorentz       4200
Austin        4800
Pataballa     4800
Ernst         6000
Hunold        9000
5 rows selected

Now for each row I would like to see the value of the previous salary:

HR>SELECT e.last_name, e.salary
      ,LAG(salary, 1, 0) OVER ( ORDER BY salary) AS prev_sal
  FROM employees e, departments d
 WHERE department_name = 'IT'
   AND e.department_id = d.department_id
 ORDER BY salary;

LAST_NAME   SALARY   PREV_SAL
---------- ------- ----------
Lorentz       4200      0
Austin        4800   4200
Pataballa     4800   4800
Ernst         6000   4800
Hunold        9000   6000
5 rows selected.

The default offset is 1 (the value from one row back)
Optionally, you can add a default value.
e.g.: LAG(salary, 1, 0) means the salary from the previous row, if no value is known the value is 0

What I’m really interested in is the difference between the salaries:

HR>SELECT e.last_name, e.salary
      ,(salary - LAG(salary, 1, 0) OVER ( ORDER BY salary)) AS diff
  FROM employees e, departments d
 WHERE department_name = 'IT'
   AND e.department_id = d.department_id
 ORDER BY salary;

LAST_NAME   SALARY   DIFF
---------- ------- ----------
Lorentz       4200   4200
Austin        4800    600
Pataballa     4800      0
Ernst         6000   1200
Hunold        9000   3000
5 rows selected.

Notice the first DIFF value, this is 4200 because the default value is 0
We can change this and have a better looking result:

HR>SELECT e.last_name, e.salary
      ,(salary - LAG(salary, 1, 4200) OVER ( ORDER BY salary)) AS diff
  FROM employees e, departments d
 WHERE department_name = 'IT'
   AND e.department_id = d.department_id
 ORDER BY salary;

LAST_NAME   SALARY   DIFF
---------- ------- ----------
Lorentz       4200      0
Austin        4800    600
Pataballa     4800      0
Ernst         6000   1200
Hunold        9000   3000
5 rows selected.

Keeping the first and last in your result with analytic functions in your query

Thursday, June 3rd, 2010

Today I’m going to show you how to use the KEEP, FIRST and LAST analytic functions in your query.

FIRST and LAST can be used in analytic functions that operate on a set of values that rank as the FIRST or LAST with respect to a given sorting.

When would you need these functions?
When you need a value from the first or last row of a sorted group, but the needed value is not the the same as the sort key.
The FIRST and LAST functions eliminate the need for self-joins or views and thus gives you a better performance.

The KEEP function qualifies an aggregate_function, indicating that only the FIRST or LAST values of aggregate_function will be returned.

(aggregate_function) KEEP (query_partitioning_clause)
e.g.: MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired

Let’s look at this query:

HR>SELECT d.department_name, e.last_name, e.hire_date
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
    ORDER BY department_name, hire_date;

DEPARTMENT LAST_NAME  HIRE_DATE
---------- ---------- ----------
Accounting Higgins    07/06/1994
Accounting Gietz      07/06/1994
Administra Whalen     17/09/1987
Executive  King       17/06/1987
Executive  Kochhar    21/09/1989
Executive  De Haan    13/01/1993
Finance    Faviet     16/08/1994
Finance    Greenberg  17/08/1994
...
Finance    Popp       07/12/1999
Human Reso Mavris     07/06/1994
IT         Hunold     03/01/1990
IT         Ernst      21/05/1991
IT         Austin     25/06/1997
IT         Pataballa  05/02/1998
IT         Lorentz    07/02/1999
Marketing  Hartstein  17/02/1996
Marketing  Fay        17/08/1997
Public Rel Baer       07/06/1994
Purchasing Raphaely   07/12/1994
...
Purchasing Himuro     15/11/1998
Purchasing Colmenares 10/08/1999
Sales      King       30/01/1996
Sales      Sully      04/03/1996
...
Sales      Ande       24/03/2000
Sales      Banda      21/04/2000
Sales      Kumar      21/04/2000
Shipping   Kaufling   01/05/1995
Shipping   Ladwig     14/07/1995
Shipping   Rajs       17/10/1995
...
Shipping   Geoni      03/02/2000
Shipping   Philtanker 06/02/2000
Shipping   Markle     08/03/2000
106 rows selected.

(I have removed rows from the result for readability)

From this result you would like to know for each department the names of the persons that are hired first and last.
You add to the query:
MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired

For each row in your result:
-partition your data per department
-sort it on hiredate
-DENSE_RANK FIRST will aggregate over only those rows with the minimum (FIRST)dense rank
-KEEP will use the MIN(hire_date) function only over these results and not the complete results of your query

HR>SELECT d.department_name, e.last_name, e.hire_date
          ,MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired
          ,MAX(hire_date) KEEP ( DENSE_RANK LAST  ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS LastHired
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
    ORDER BY department_name, hire_date;

DEPARTMENT LAST_NAME  HIRE_DATE  FIRSTHIRED LASTHIRED
---------- ---------- ---------- ---------- ----------
Accounting Gietz      07/06/1994 07/06/1994 07/06/1994
Accounting Higgins    07/06/1994 07/06/1994 07/06/1994
Administra Whalen     17/09/1987 17/09/1987 17/09/1987
Executive  King       17/06/1987 17/06/1987 13/01/1993
Executive  Kochhar    21/09/1989 17/06/1987 13/01/1993
Executive  De Haan    13/01/1993 17/06/1987 13/01/1993
Finance    Faviet     16/08/1994 16/08/1994 07/12/1999
Finance    Greenberg  17/08/1994 16/08/1994 07/12/1999
...
Finance    Popp       07/12/1999 16/08/1994 07/12/1999
Human Reso Mavris     07/06/1994 07/06/1994 07/06/1994
IT         Hunold     03/01/1990 03/01/1990 07/02/1999
IT         Ernst      21/05/1991 03/01/1990 07/02/1999
IT         Austin     25/06/1997 03/01/1990 07/02/1999
IT         Pataballa  05/02/1998 03/01/1990 07/02/1999
IT         Lorentz    07/02/1999 03/01/1990 07/02/1999
Marketing  Hartstein  17/02/1996 17/02/1996 17/08/1997
Marketing  Fay        17/08/1997 17/02/1996 17/08/1997
Public Rel Baer       07/06/1994 07/06/1994 07/06/1994
Purchasing Raphaely   07/12/1994 07/12/1994 10/08/1999
...
Purchasing Himuro     15/11/1998 07/12/1994 10/08/1999
Purchasing Colmenares 10/08/1999 07/12/1994 10/08/1999
Sales      King       30/01/1996 30/01/1996 21/04/2000
Sales      Sully      04/03/1996 30/01/1996 21/04/2000
...
Sales      Ande       24/03/2000 30/01/1996 21/04/2000
Sales      Banda      21/04/2000 30/01/1996 21/04/2000
Sales      Kumar      21/04/2000 30/01/1996 21/04/2000
Shipping   Kaufling   01/05/1995 01/05/1995 08/03/2000
Shipping   Ladwig     14/07/1995 01/05/1995 08/03/2000
...
Shipping   Geoni      03/02/2000 01/05/1995 08/03/2000
Shipping   Philtanker 06/02/2000 01/05/1995 08/03/2000
Shipping   Markle     08/03/2000 01/05/1995 08/03/2000
106 rows selected.

(I have removed rows from the result for readability)

Again we have 106 rows and we can see the first and last hiredate for each department and how this relates to to the hire_date for the individual employee.
Now we will filter only the result we want from the previous query:

HR>SELECT department_name, last_name, hire_date
     FROM (
       SELECT d.department_name, e.last_name, e.salary,e.hire_date
              ,MIN(hire_date) KEEP ( DENSE_RANK FIRST ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS FirstHired
              ,MAX(hire_date) KEEP ( DENSE_RANK LAST  ORDER BY hire_date) OVER ( PARTITION BY e.department_id) AS LastHired
         FROM employees e, departments d
        WHERE e.department_id = d.department_id
         )
   WHERE (hire_date = FirstHired OR hire_date = LastHired)
   ORDER BY department_name, hire_date;

DEPARTMENT LAST_NAME  HIRE_DATE
---------- ---------- ----------
Accounting Higgins    07/06/1994
Accounting Gietz      07/06/1994
Administra Whalen     17/09/1987
Executive  King       17/06/1987
Executive  De Haan    13/01/1993
Finance    Faviet     16/08/1994
Finance    Popp       07/12/1999
Human Reso Mavris     07/06/1994
IT         Hunold     03/01/1990
IT         Lorentz    07/02/1999
Marketing  Hartstein  17/02/1996
Marketing  Fay        17/08/1997
Public Rel Baer       07/06/1994
Purchasing Raphaely   07/12/1994
Purchasing Colmenares 10/08/1999
Sales      King       30/01/1996
Sales      Banda      21/04/2000
Sales      Kumar      21/04/2000
Shipping   Kaufling   01/05/1995
Shipping   Markle     08/03/2000
20 rows selected.

Ranking your results: using RANK, DENSE_RANK and ROW_NUMBER

Tuesday, June 1st, 2010

In this article I’m going to explain the use of the analytic functions RANK, DENSE_RANK and ROW_NUMBER to rank the results of your query.

Consider the following query:

HR> UPDATE employees SET salary = 2900  WHERE employee_id = 117;
1 row updated.

HR>SELECT d.department_name, e.last_name, e.salary
  FROM employees e, departments d
 WHERE e.department_id = d.department_id
   AND d.department_id IN ('20','30', '40')
 ORDER BY d.department_name, e.salary, e.last_name;

DEPARTMENT LAST_NAME   SALARY
---------- ---------- -------
Human Reso Mavris        6500
Marketing  Fay           6000
Marketing  Hartstein    13000
Purchasing Colmenares    2500
Purchasing Himuro        2600
Purchasing Baida         2900
Purchasing Tobias        2900
Purchasing Khoo          3100
Purchasing Raphaely     11000
9 rows selected.
HR>

Now from this result you want to rank the salaries per department.
Which 3 persons in each department earns the most money?

Let’s use the ROW_NUMBER function…

HR>SELECT d.department_name, e.last_name, e.salary
          ,ROW_NUMBER() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS rownumber
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
      AND d.department_id IN ('20','30', '40')
    ORDER BY d.department_name, e.salary, e.last_name, rownumber;

DEPARTMENT LAST_NAME   SALARY  ROWNUMBER
---------- ---------- ------- ----------
Human Reso Mavris        6500          1
Marketing  Fay           6000          1
Marketing  Hartstein    13000          2
Purchasing Colmenares    2500          1
Purchasing Himuro        2600          2
Purchasing Baida         2900          3
Purchasing Tobias        2900          4
Purchasing Khoo          3100          5
Purchasing Raphaely     11000          6
9 rows selected.

As you can see we just added one line, the rest of the query is unchanged.
One thing to remember is that the analytic function is parsed after the SELECT and WHERE clause, but before the ORDER BY clause.

What happend here?
For the result of your query you PARTITIONED it by department_id and ordered the results for each department.
And then OVER this subset of your data, the ROW_NUMBER function is executed, resulting in a rownumbering foreach item in the subset of data.

Notice in the RowNum result that all numbers are unique for a department. Even items with the same salary will get a different rownumber because the position in the result is different.

What if we want to give a same ranking to equal salaries?

HR>SELECT d.department_name, e.last_name, e.salary
          ,RANK() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS rank
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
      AND d.department_id IN ('20','30', '40')
    ORDER BY d.department_name, e.salary, e.last_name, rank;

DEPARTMENT LAST_NAME   SALARY  RANK
---------- ---------- ------- -----
Human Reso Mavris        6500     1
Marketing  Fay           6000     1
Marketing  Hartstein    13000     2
Purchasing Colmenares    2500     1
Purchasing Himuro        2600     2
Purchasing Baida         2900     3
Purchasing Tobias        2900     3
Purchasing Khoo          3100     5
Purchasing Raphaely     11000     6
9 rows selected.

We replaced the ROW_NUMBER function with the RANK function.
Notice we have 2x a rank=3, rank=4 is missing and we have a rank=5.

What if we want no gaps in the numbering?

HR>SELECT d.department_name, e.last_name, e.salary
          ,DENSE_RANK() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS drank
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
      AND d.department_id IN ('20','30', '40')
    ORDER BY d.department_name, e.salary, e.last_name, drank;

DEPARTMENT LAST_NAME   SALARY DRANK
---------- ---------- ------- -----
Human Reso Mavris        6500     1
Marketing  Fay           6000     1
Marketing  Hartstein    13000     2
Purchasing Colmenares    2500     1
Purchasing Himuro        2600     2
Purchasing Baida         2900     3
Purchasing Tobias        2900     3
Purchasing Khoo          3100     4
Purchasing Raphaely     11000     5
9 rows selected.

The DENSE_RANK function prevents the gap in the ranking results.
To compare the results:

HR>SELECT d.department_name, e.last_name, e.salary
          ,ROW_NUMBER() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS rownumber
          ,RANK() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS rank
          ,DENSE_RANK() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS drank
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
      AND d.department_id IN ('20','30', '40')
    ORDER BY d.department_name, e.salary, e.last_name, rank;

DEPARTMENT LAST_NAME   SALARY  ROWNUMBER  RANK DRANK
---------- ---------- ------- ---------- ----- -----
Human Reso Mavris        6500          1     1     1
Marketing  Fay           6000          1     1     1
Marketing  Hartstein    13000          2     2     2
Purchasing Colmenares    2500          1     1     1
Purchasing Himuro        2600          2     2     2
Purchasing Baida         2900          3     3     3
Purchasing Tobias        2900          4     3     3
Purchasing Khoo          3100          5     5     4
Purchasing Raphaely     11000          6     6     5
9 rows selected.

Going back to the requested results: to select only the top 3 salaries per department:

HR>SELECT * FROM (
       SELECT d.department_name, e.last_name, e.salary
              ,RANK() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS rank
         FROM employees e, departments d
        WHERE e.department_id = d.department_id
          AND d.department_id IN ('20','30', '40')
        ORDER BY d.department_name, e.salary, e.last_name, rank
        )
    WHERE rank <=3;

DEPARTMENT LAST_NAME   SALARY  RANK
---------- ---------- ------- -----
Human Reso Mavris        6500     1
Marketing  Fay           6000     1
Marketing  Hartstein    13000     2
Purchasing Colmenares    2500     1
Purchasing Himuro        2600     2
Purchasing Baida         2900     3
Purchasing Tobias        2900     3
7 rows selected.