Passing WHERE Clauses to the DBMS using SAS

This feature has been used by Carol Cobb (Information Systems) to create very efficient table extracts in SAS. The information was obtained from the SAS online web manual. Any questions concerning this code should be directed to directory@ufl.edu.

The DBKEY= data set option should be used in contexts in which you want to join a large DBMS table and a relatively small SAS data set. If these conditions are not satisfied, the options will not improve performance and, in some cases, they might impactperformance negatively.

Here is an example of how performance is improved by using this option:

data keyvalues;
deptno=30; output;
deptno=10; output;
run;

libname dblib oracle user=testuser password=testpass path='myorapath';

proc sql;
select bigtab.deptno, bigtab.loc
from dblib.dept bigtab,
keyvalues smallds
where bigtab.deptno = smallds.deptno;
quit;

In this example, the SQL procedure retrieves all the rows in the large ORACLE tableDEPT and applies the WHERE clause during SQL procedure processing in SAS. Processing can be both CPU-intensive and I/O-intensive, if MYTABLE is large. Use the DBKEY= option with the previous example and compare performance:

proc sql;
select bigtab.deptno, bigtab.loc
from dblib.dept(dbkey=deptno) bigtab,
keyvalues smallds
where bigtab.deptno = smallds.deptno;
quit;

In this example, the DBKEY= option instructs the SQL procedure to pass the WHERE clause to the SAS/ACCESS engine in a form similar to where deptno=host-variable. The engine then passes this optimized query to the DBMS server. The host-variable is substituted, one at a time, with DEPTNO values from the observations in the SAS data file KEYVALUES. As a result, only rows that match the WHERE clause are retrieved from the DBMS. Without this option, PROC SQL retrieves all the rows from the DEPTtable.

The SQL statement that is created by the SAS/ACCESS engine and passed to the DBMS is similar to the following;

select deptno, loc
from bigtab.deptno
where deptno=:hostvariable;

The host-variable takes the value of the DEPTNO variable from the SAS data fileKEYVALUES. The number of SELECT statements issued is equal to the number of rows in the data file. Therefore, for improved performance, the SAS data file should contain relatively fewer rows than the DBMS table to which it is being joined.

The DBKEY= option can also be used in a SAS DATA step, with the KEY= option in the SET statement, to improve the performance of joins. Specify a value of KEY=DBKEYin this situation. The following DATA step creates a new data file by joining the data file KEYVALUES with the DBMS table MYTABLE. The variable DEPTNO is used with the DBKEY= option to cause a WHERE clause to be issued by the SAS/ACCESS engine.

data sasuser.new;
set sasuser.keyvalues;
set dblib.mytable(dbkey=deptno) key=dbkey;
run;