Compression is a process that reduces the number of
bytes that are required to represent each table row. In a compressed
file, each row is a variable-length record. In an uncompressed file,
each row is a fixed-length record. Compressed tables contain an internal
index that maps each row number to a disk address so that the application
can access data by row number. This internal index is transparent
to the user. Compressed tables have the same access capabilities as
uncompressed tables. Here are some advantages of compressing a file:
-
reduced storage requirements for
the file
-
fewer
I/O operations necessary
to read from or write to the data during processing
Here are some disadvantages
of compressing a file:
-
More CPU resources are required
to read a compressed file because of the overhead of uncompressing
each observation.
-
There are situations when the resulting
file size might increase rather than decrease.
These are the types
of compression that you can specify:
-
CHAR to use the RLE (Run Length
Encoding) compression algorithm, which works best for character data.
-
BINARY to use the RDC (Ross Data
Compression) algorithm, which is highly effective for compressing
medium to large (several hundred bytes or larger) blocks of binary
data.
You can compress these
types of tables:
-
all tables that are created during
a SAS session. Besides specifying SAS system options on the command
line or inside a SAS program with the OPTIONS statement, you can use
SAS Data Integration Studio to set system options. For example, you
can use the
System Options field to set the
COMPRESS= system option on a table loader transformation. (A table
loader transformation generates or retrieves code that puts data into
a specified target table.)
-
Note: You cannot specify compression
for an SPD Engine data library.
-
an individual table. In SAS Data
Integration Studio, SAS tables have a
Compressed option
that is available from the table properties dialog box. To use CHAR
compression, you select
YES. To use BINARY
compression, you select
Binary.
For SPD Engine tables
and third-party relational database tables, you can use the
Table
Options field in the table properties dialog box to specify
the COMPRESS= option.
Note: The SPD Engine compresses
the data component (DPF) file by blocks as the engine is creating
the file. (The data component file stores partitions for an SPD Engine
table.) To specify the number of observations that you want to store
in a compressed block, you use the IOBLOCKSIZE= table option in addition
to the COMPRESS= table option. For example, in the
Table
Options field in the table properties dialog box, you
might enter
COMPRESS=YES IOBLOCKSIZE=10000
.
The default blocksize is 4096 (4k).
When you create a compressed
table, SAS records in the log the percentage of reduction that is
obtained by compressing the file. SAS obtains the compression percentage
by comparing the size of the compressed file with the size of an uncompressed
file of the same page size and record count. After a file is compressed,
the setting is a permanent attribute of the file, which means that
to change the setting, you must re-create the file. For example, to
uncompress a file, in SAS Data Integration Studio, select
Default
(NO) for the
Compressed option
in the table properties dialog box for a SAS table.
For more information
about compression, see
SAS Data Set Options: Reference.