Glossary
- block
-
a group of observations in a data set. Use of
blocks enable thread-enabled applications to read, write, and process
the observations faster than if they are delivered as individual observations.
- compound WHERE expression
-
a WHERE expression that contains more than one
operator, as in WHERE X=1 and Y>3.
- controller
-
a computer component that manages the interaction
between the computer and a peripheral device such as a disk or a RAID.
For example, a controller manages data I/O between a CPU and a disk
drive. A computer can contain many controllers. A single CPU can command
more than one controller, and a single controller can command multiple
disks.
- CPU-bound application
-
an application whose performance is constrained
by the speed at which computations can be performed on the data. Multiple
CPUs and threading technology can alleviate this problem.
- data partition
-
a physical file that contains data and which is
part of a collection of physical files that comprise the data component
of a SAS Scalable Performance Data Engine data set.
- I/O-bound application
-
an application whose performance is constrained
by the speed at which data can be delivered for processing. Multiple
CPUs, partitioned I/O, threading technology, RAID (redundant array
of independent disks) technology, or a combination of these can alleviate
this problem.
- light-weight process thread
-
a single-threaded subprocess that is created and
controlled independently, usually with operating system calls. Multiple
light-weight process threads can be active at one time on symmetric
multiprocessing (SMP) hardware or in thread-enabled operating systems.
- parallel I/O
-
a method of input and output that takes advantage
of multiple CPUs and multiple controllers, with multiple disks per
controller to read or write data in independent threads.
- parallel processing
-
a method of processing that divides a large job
into several smaller jobs that can be executed in parallel on multiple
CPUs.
- partition
-
part or all of a logical file that spans devices
or directories. In the SPD Engine, a partition is one physical file.
Data files, index files, and metadata files can all be partitioned,
resulting in data partitions, index partitions, and metadata partitions,
respectively. Partitioning a file can improve performance for very
large data sets.
- primary path
-
the location in which SPD Engine metadata files
are stored. The other SPD Engine component files (data files and index
files) are stored in separate storage paths in order to take advantage
of the performance boost of multiple CPUs.
- RAID
-
a type of storage system that comprises many disks
and which implements interleaved storage techniques that were developed
at the University of California at Berkeley. RAIDs can have several
levels. For example, a level-0 RAID combines two or more hard drives
into one logical disk drive. Various RAID levels provide various levels
of redundancy and storage capability. A RAID provides large amounts
of data storage inexpensively. Also, because the same data is stored
in different places, I/O operations can overlap, which can result
in improved performance. Short form: RAID.
- redundancy
-
a characteristic of computing systems in which
multiple interchangeable components are provided in order to minimize
the effects of failures, errors, or both. For example, if data is
stored redundantly (in a RAID, for example), then if one disk is lost,
the data is still available on another disk.
- redundant array of independent disks
-
See RAID.
- sasroot
-
a representation of the name for the directory
or folder in which SAS is installed at a site or a computer.
- SASROOT
-
a term that represents the name of the directory
or folder in which SAS is installed at your site or on your computer.
- scalability
-
the ability of a software application to function
well with little degradation in performance despite changes in the
volume of computations or operations that it performs and despite
changes in the computing environment. Scalable software is able to
take full advantage of increases in computing capability such as those
that are provided by the use of SMP hardware and threaded processing.
- Scalable Performance Data Engine
-
a SAS engine that is able to deliver data to applications
rapidly because it organizes the data into a streamlined file format.
Short form: SPD Engine.
- scalable software
-
software that responds to increased computing
capability on SMP hardware in the expected way. For example, if the
number of CPUs is increased, the time to solution for a CPU-bound
problem decreases by a proportionate amount. And if the throughput
of the I/O system is increased, the time to solution for an I/O-bound
problem decreases by a proportionate amount.
- server scalability
-
the ability of a server to take advantage of SMP
hardware and threaded processing in order to process multiple client
requests simultaneously. That is, the increase in computing capacity
that SMP hardware provides increases proportionately the number of
transactions that can be processed per unit of time.
- SMP
-
See symmetric multiprocessing.
- sort indicator
-
an attribute of a data file that indicates whether
a data set is sorted, how it was sorted, and whether the sort was
validated. Specifically, the sort indicator attribute indicates the
following information: 1) the BY variable(s) that were used in the
sort; 2) the character set that was used for the character variables;
3) the collating sequence of character variables that was used; 4)
whether the sort information has been validated. This attribute is
stored in the data file descriptor information. Any SAS procedure
that requires data to be sorted as a part of its process uses the
sort indicator.
- spawn
-
to start a process or a process thread such as
a light-weight process thread (LWPT).
- SPD Engine
-
See Scalable Performance Data Engine.
- SPD Engine data file
-
the data component of an SPD Engine data set.
In contrast to SAS data files, SPD Engine data files contain only
data; they do not contain metadata. The SPD Engine does not support
data views.
- SPD Engine data set
-
a data set created by the SPD Engine that has
up to four component files: one for data, one for metadata, and two
for any indexes. The minimum number of component files is two: data
and metadata. Data is separated from the metadata for SPD Engine file
organization.
- symmetric multiprocessing
-
a hardware and software architecture that can
improve the speed of I/O and processing. An SMP machine has multiple
CPUs and a thread-enabled operating system. An SMP machine is usually
configured with multiple controllers and with multiple disk drives
per controller. Short form: SMP.
- thread
-
a single path of execution of a process that runs
on a core on a CPU.
- thread-enabled operating system
-
an operating system that can coordinate symmetric
access by multiple CPUs to a shared main memory space. This coordinated
access enables threads from the same process to share data very efficiently.
- thread-enabled procedure
-
a SAS procedure that supports threaded I/O or
threaded processing.
- threaded I/O
-
I/O that is performed by multiple threads in order
to increase its speed. In order for threaded I/O to improve performance
significantly, the application that is performing the I/O must be
capable of processing the data rapidly as well.
- threaded processing
-
processing that is performed in multiple threads
in order to improve the speed of CPU-bound applications.
- threading
-
a high-performance technology for either data
processing or data I/O in which a task is divided into threads that
are executed concurrently on multiple cores on one or more CPUs.
- time to solution
-
the elapsed time that is required for completing
a task. Time-to- solution measurements are used to compare the performance
of software applications in different computing environments. In other
words, they can be used to measure scalability.
- WHERE expression
-
defines the criteria for selecting observations.
Copyright © SAS Institute Inc. All rights reserved.