How Data Flows When You Use SAS Files

Introduction

To tune applications that access data concurrently, it is to your advantage to understand how data is read and written in the different types of members of SAS libraries that can be accessed through a server.

It is important to remember that an application cannot run any faster when it accesses data through a server than it can when it accesses data directly. This might seem obvious, but it is surprisingly easy to simply blame an application's sluggish performance on the server without ever testing the application while accessing the data without going through a server. For many applications, the difference in performance between accessing the data directly versus accessing the data through a server will not be large. Whenever you develop a new application, verify that the application runs acceptably while accessing its data directly before you add a server to the application's data access.

SAS Data Files

When a SAS session reads from a SAS data file that is accessed directly:

The procedure or DATA step requests an observation from the engine.
The engine requests the SAS host interface to read the page of the data file that contains the observation.
The engine extracts the observation from the page and returns it to the procedure.

When a SAS session updates or adds to a SAS data file that is accessed directly:

The procedure calls the engine to replace or add the observation.
The engine replaces or adds the observation in the page.
The engine calls the host interface to write the updated or new page to disk.

When a SAS session reads from a SAS data file that is accessed through a server:

The procedure or DATA step requests the observation from the REMOTE engine.
The REMOTE engine determines whether the requested observation is already available in its transmission buffer in the user's SAS session. If the observation is available, it is returned to the procedure.
If the observation is not already available in the user's SAS session, the REMOTE engine sends a message to the server to get a buffer full of observations, including the observation requested by the procedure.
The server fills the transmission buffer by requesting one or more observations from the engine that accesses the data file in the server's SAS session.
For each observation, the engine in the server's session requests the SAS host interface to read the page of the data file that contains the observation.
The engine in the server's SAS session extracts each observation from its page and returns it to the server.
After filling the transmission buffer, the server sends the buffer to the REMOTE engine.
The REMOTE engine extracts the selected observation from the transmission buffer and returns it to the procedure or DATA step.

When a SAS session updates or adds to a SAS data file that is accessed through a server:

The procedure calls the REMOTE engine to replace or add the observation.
The REMOTE engine replaces the observation in its transmission buffer or adds the observation to its transmission buffer.
If the data file is open for update access, the REMOTE engine sends a message to the server that carries the new or updated observation and requests that it be updated in or added to the data file.
If the data file is open for output access, the REMOTE engine adds observations to its transmission buffer until the buffer is full. After the transmission buffer is full, the REMOTE engine sends it to the server.
The server requests the engine that accesses the library in the server's SAS session to replace the observation in the data file or add the observation(s) to the data file.
The engine in the server's SAS session replaces or adds each observation by updating and creating pages in the data file.
The engine requests the SAS host interface to write each updated and new page to the data file.
The engine in the server's SAS session returns to the server.
The server replies to the REMOTE engine indicating that the updated or new observation has been stored in the data file.
The REMOTE engine returns to the procedure.

SAS Data Views

The flow of data as a SAS data view is processed can be complex, because a view is a set of instructions that tells how to select and combine data from one or more sources.

A SAS data view can be interpreted in a user's SAS session or a server's SAS session. When a view is interpreted in a user's SAS session, the view file and none, some, or all of the data read by the view can be accessed through a server. When a view is interpreted in a server's SAS session, the view file and all of the data read by the view must be accessed by the server.

There are three types of SAS views:

PROC SQL views, which are interpreted by the SQL engine
SAS/ACCESS views, which are interpreted by SAS/ACCESS interface engines
DATA step views, which are interpreted by the DATA step view engine

A view created by the SQL procedure can read SAS data sets (SAS data files and any kind of SAS data view).

When a SAS/ACCESS view engine is used in a multi-user server's session, the view engine can read only from the database; it cannot update the database. The flow of data is one-way: from the database to the interface engine to the server to the user.

A DATA step view can, like a PROC SQL view, combine data from SAS data files and SAS data views. In addition, DATA step views can include sophisticated calculations and read data from external files. A DATA step view can produce data exclusively by calculation, without reading any data.

SAS Catalogs

SAS catalogs are containers for many different types of entries, and the data in each type of entry is accessed in a pattern unique to the entry type. Like the observations in SAS data sets, the REMOTE engine will combine records in a catalog entry into groups. The combination of records for catalog entries is done only for INPUT opens (OUTPUT and UPDATE opens transmit one record at a time).