Understanding Integrity Constraints

Definition of Integrity Constraints

Integrity constraints are a set of data validation rules that you can specify in order to restrict the data values that can be stored for a variable in a SAS data file. Integrity constraints help you preserve the validity and consistency of your data. SAS enforces the integrity constraints when the values associated with a variable are added, updated, or deleted.
There are two categories of integrity constraints: general and referential.

General and Referential Integrity Constraints

General Integrity Constraints

General integrity constraints enable you to restrict the values of variables within a single file. There are four types of general constraints:
check
limits the data values of variables to a specific set, range, or list of values. Check constraints can also be used to ensure that the data values in one variable within an observation are contingent on the data values of another variable in the same observation.
not null
requires that a variable contain a data value. Null (missing) values are not allowed.
unique
requires that the specified variable or variables contain unique data values. A null data value is allowed but is limited to a single instance.
primary key
requires that the specified variable or variables contain unique data values and that null data values are not allowed. Only one primary key can exist in a data file.
Note: A primary key is a general integrity constraint if it does not have any foreign key constraints referencing it.

Referential Integrity Constraints

A referential integrity constraint is created when a primary key integrity constraint in one data file is referenced by a foreign key integrity constraint in another data file.
The foreign key constraint links the data values of one or more variables in the foreign key data file, to corresponding variables and values in the primary key data file. Data values in the foreign key data file must have a matching value in the primary key data file, or they must be null. When data is updated or deleted in the primary key data file, the modifications are controlled by a referential action that is defined as part of the foreign key constraint.
Separate referential actions can be defined for the Update and Delete operations. There are three types of referential actions:
restrict
prevents the data values of the primary key variables from being updated or deleted if there is a matching value in one of the foreign key data file's corresponding foreign key variables. The restrict type of action is the default action if one is not specified.
set null
enables the data values of the primary key variables to be updated or deleted, but matching data values in the foreign key data files are changed to null (missing) values.
cascade
enables the data values in the primary key variables to be updated, and also updates matching data values in the foreign key data files to the same value. The cascade type of action is supported only for Update operations.
The requirements for establishing a referential relationship are as follows:
  • The primary key and foreign key must reference the same number of variables, and the variables must be in the same order.
  • The variables must be of the same type (character or numeric) and length.
  • If the foreign key is being added to a data file that already contains data, the data values in the foreign key data file must either match existing values in the primary key data file, or the values must be null.
The foreign key data file can exist in the same SAS library as the referenced primary key data file (intra-libref), or in a different SAS library (inter-libref). However, if the library that contains the foreign key data file is temporary, the library that contains the primary key data file must be temporary as well. In addition, referential integrity constraints cannot be assigned to data files in concatenated libraries.
There is no limit to the number of foreign keys that can reference a primary key. However, additional foreign keys can adversely impact the performance of Update and Delete operations.
When a referential constraint exists, a primary key integrity constraint is not deleted until all of the foreign keys that reference it are deleted. There are no restrictions on deleting foreign keys.

Overlapping Primary Key and Foreign Key Constraints

Variables in a SAS data file can be part of both a primary key (general integrity constraint) and a foreign key (referential integrity constraint). However, there are restrictions when you define a primary key and a foreign key constraint that use the same variables:
  • The foreign key's update and delete referential actions must both be RESTRICT.
  • When the same variables are used in a primary key and foreign key definition, the variables must be defined in a different order.

Preservation of Integrity Constraints

These procedures preserve integrity constraints when their operation results in a copy of the original data file:
  • in Base SAS software, the APPEND, COPY, CPORT, CIMPORT, MIGRATE, and SORT procedures
  • in SAS/CONNECT software, the UPLOAD and DOWNLOAD procedures
  • PROC APPEND
    • for an existing BASE= data file, integrity constraints in the BASE= file are preserved, but integrity constraints in the DATA= file that is being appended to the BASE= file are not preserved.
    • for a non-existent BASE= data file, general integrity constraints in the DATA= file that is being appended to the new BASE= file are preserved. Referential constraints in the DATA= file are not preserved.
  • PROC SORT, and PROC UPLOAD, and PROC DOWNLOAD, when an OUT= data file is not specified
  • the SAS Explorer window
You can also use the CONSTRAINT option to control whether integrity constraints are preserved for the COPY, CPORT, CIMPORT, UPLOAD, and DOWNLOAD procedures.
General integrity constraints are preserved in an active state. The state in which referential constraints are preserved depends on whether the procedure causes the primary key and foreign key data files to be written to the same or different SAS libraries (intra-libref versus inter-libref integrity constraints). Intra-libref constraints are preserved in an active state. Inter-libref constraints are preserved in an inactive state. That is, the primary key portion of the integrity constraint is enforced as a general integrity constraint but the foreign key portion is inactive. You must use the DATASETS procedure statement IC REACTIVATE to reactivate the inactive foreign keys.
The following table summarizes the circumstances under which integrity constraints are preserved.
Circumstances That Cause Integrity Constraints to Be Preserved
Procedure
Condition
Constraints That Are Preserved
APPEND
DATA= data set does not exist
General constraints
Referential constraints are not affected
COPY
CONSTRAINT=yes
General constraints
Intra-libref constraints are referential in an active state
Inter-libref constraints are referential in an inactive state
CPORT/CIMPORT
CONSTRAINT=yes
General constraints
Intra-libref constraints are referential in an active state
Inter-libref constraints are referential in an inactive state
SORT
OUT= data set is not specified
General constraints
Referential constraints are not affected
UPLOAD/DOWNLOAD
CONSTRAINT=yes and OUT= data set is not specified
General constraints
Intra-libref constraints are referential in an active state
Inter-libref constraints are referential in an inactive state
SAS Explorer window
General constraints
CAUTION:
If your data files contain integrity constraints, do not use operating environment commands to copy, move, or delete the data files.

Indexes and Integrity Constraints

The unique, primary key, and foreign key integrity constraints store data values in an index file. If an index file already exists, it is used. Otherwise, one is created. Consider the following points when you create or delete an integrity constraint:
  • When a user-defined index exists, the index's attributes must be compatible with the integrity constraint in order for the integrity constraint to be created. For example, when you add a primary key integrity constraint, the existing index must have the UNIQUE attribute. When you add a foreign key integrity constraint, the index must not have the UNIQUE attribute.
  • The unique integrity constraint has the same effect as the UNIQUE index attribute. Therefore, when one is used, the other is not necessary.
  • The NOMISS index attribute and the not-null integrity constraint have different effects. The integrity constraint prevents missing values from being written to the SAS data file and cannot be added to an existing data file that contains missing values. The index attribute allows missing data values in the data file but excludes them from the index.
  • When any index is created, it is marked as being “owned” by the user, the integrity constraint, or both. A user cannot delete an index that is also owned by an integrity constraint and vice versa. If an index is owned by both, the index is deleted only after both the integrity constraint and the user have requested the index's deletion. A note in the log indicates when an index cannot be deleted.

Locking Integrity Constraints

Integrity constraints support both member-level and record-level locking. You can override the default locking level with the CNTLLEV= data set option. For more information, see the CNTLLEV= Data Set Option in SAS Data Set Options: Reference.

Specifying Integrity Constraints

You can create integrity constraints in the SQL procedure, the DATASETS procedure, or in SCL (SAS Component Language). The constraints can be specified when the data file is created or can be added to an existing data file. When you add integrity constraints to an existing file, SAS verifies that the existing data values conform to the constraints that are being added.
When you specify integrity constraints, you must specify a separate statement for each constraint. In addition, you must specify a separate statement for each variable to which you want to assign the not-null integrity constraint. When multiple variables are included in the specification for a primary key, foreign key, or a unique integrity constraint, a composite index is created and the integrity constraint enforces the combination of variable values. The relationship between SAS indexes and integrity constraints is described in Indexes and Integrity Constraints. For more information, see Understanding SAS Indexes.
When you add an integrity constraint in SCL, open the data set in utility mode. See Creating Integrity Constraints By Using SCL for an example. Integrity constraints must be deleted in utility open mode. For detailed syntax information, see SAS Component Language: Reference.
When generation data sets are used, you must create the integrity constraints in each data set generation that includes protected variables.
CAUTION:
CHECK constraints in SAS 9.2 are not compatible with earlier releases of SAS.
If you add a CHECK constraint to an existing SAS data set or create a SAS data set that includes a CHECK constraint, the data set cannot be accessed by a release prior to SAS 9.2.

Specifying Physical Location for Inter-Libref Referential Integrity Constraints When Sharing Disk Space

When you share disk space over a network and access referential integrity constraints in which the foreign key and primary key data files are in different SAS libraries, a standard should be established for the physical location of the shared files. A standard is required when you create the shared files so that network machines use the same physical name in order to access the files. If the physical names do not match, SAS cannot open the referenced foreign key or primary key SAS data file.
For example, a standard might be established that all shared files are placed on disk T: so that network machines use the same pathname in order to access the files.
Here is an example of a problem regarding files that were created without a standard. Suppose a primary key and a foreign key SAS data file were created on machine D4064 in different directories C:\Public\pkey_directory and C:\Public\fkey_directory. The pathnames are stored in the descriptor information of the SAS data files.
To access the primary key data file from a different machine such as F2760, the following LIBNAME statement would be executed:
libname pkds '\\D4064\Public\pkey_directory';
When the primary key data file is opened for update processing, SAS automatically tries to open the foreign key data file by using the foreign key data file's physical name that is stored in the primary key data file, which is C:\Public\fkey_directory. However, that directory does not exist on machine F2760. Therefore, opening the foreign key data file fails.

Listing Integrity Constraints

PROC CONTENTS and PROC DATASETS report integrity constraint information without special options. In addition, you can print information about integrity constraints and indexes to a data set by using the OUT2= option. In PROC SQL, the DESCRIBE TABLE and DESCRIBE TABLE CONSTRAINTS statements report integrity constraint characteristics as part of the data file definition or alone, respectively. SCL provides the ICTYPE, ICVALUE, and ICDESCRIBE functions for getting information about integrity constraints. Refer to Base SAS Procedures Guide and SAS Component Language: Reference for more information.

Rejected Observations

You can customize the error message that is associated with an integrity constraint when you create the constraint by using the MESSAGE= and MSGTYPE= options. The MESSAGE= option enables you to prepend a user-defined message to the SAS error message associated with an integrity constraint. The MSGTYPE= option enables you to suppress the SAS portion of the message. For more information, see the PROC DATASETS, PROC SQL, and SCL documentation.
Rejected observations can be collected in a special file by using an audit trail.

Integrity Constraints and CEDA Processing

When a SAS data file requires processing with CEDA, integrity constraints are not supported. For example, if you transfer a SAS data file with defined integrity constraints from one operating environment such as Windows to a different operating environment such as UNIX, CEDA translates the file for you, but the integrity constraints are not available. For information about CEDA processing, see Processing Data Using Cross-Environment Data Access (CEDA).
The MIGRATE procedure preserves integrity constraints when migrating data files. For more information, see the MIGRATE Procedure in Base SAS Procedures Guide. The CPORT and CIMPORT procedures preserve integrity constraints when transporting SAS data files from one operating environment to another operating environment. The CPORT procedure makes a copy of the data file in a transportable format, and the CIMPORT procedure reads the transport file and creates a new host-specific copy of the data file. For more information, see the CPORT Procedure in Base SAS Procedures Guide and CIMPORT Procedure in Base SAS Procedures Guide.

Examples

Creating Integrity Constraints with the DATASETS Procedure

The following sample code creates integrity constraints by means of the DATASETS procedure. The data file TV_SURVEY checks the percentage of viewing time spent on networks, PBS, and other channels, with the following integrity constraints:
  • The viewership percentage cannot exceed 100%.
  • Only adults can participate in the survey.
  • GENDER can be male or female.
data tv_survey(label='Validity checking');
   length idnum age 4 gender $1;
   input idnum gender age network pbs other;
datalines;
1 M 55 80 . 20
2 F 36 50 40 10
3 M 42 20 5 75
4 F 18 30 0 70
5 F 84 0 100 0
;

proc datasets nolist;
  modify tv_survey;
    ic create val_gender = check(where=(gender in ('M','F')))
       message = "Valid values for variable GENDER are 
       either 'M' or 'F'.";
    ic create val_age = check(where=(age >= 18 and age = 120))
       message = "An invalid AGE has been provided.";
    ic create val_new = check(where=(network = 100));
    ic create val_pbs = check(where=(pbs = 100));
    ic create val_ot = check(where=(other = 100));
    ic create val_max = check(where=((network+pbs+other)= 100));
quit;

Creating Integrity Constraints with the SQL Procedure

The following sample program creates integrity constraints by means of the SQL procedure. The data file PEOPLE lists employees and contains employment information. The data file SALARY contains salary and bonus information. The integrity constraints are as follows:
  • The names of employees receiving bonuses must be found in the PEOPLE data file.
  • The names identified in the primary key must be unique.
  • GENDER can be male or female.
  • Job status can be permanent, temporary, or terminated.
proc sql;
   create table people
    (
     name      char(14),
     gender    char(6),
     hired     num,
     jobtype   char(1) not null,
     status    char(10),

    constraint prim_key primary key(name),
    constraint gender check(gender in ('male' 'female')),
    constraint status check(status in ('permanent' 
                            'temporary' 'terminated')) 
     );

     create table salary
     (
      name     char(14),
      salary   num not null,
      bonus    num,

      constraint for_key foreign key(name) references people
         on delete restrict on update set null
     );
  quit;

Creating Integrity Constraints By Using SCL

To add integrity constraints to a data file by using SCL, you must create and build an SCL catalog entry. The following sample program creates and compiles catalog entry EXAMPLE.IC_CAT.ALLICS.SCL.
INIT:
  put "Test SCL integrity constraint functions start."; 
return;

MAIN:
   put "Opening WORK.ONE in utility mode.";
   dsid = open('work.one', 'V');/* Utility mode.*/
   if (dsid = 0) then
     do;
      _msg_=sysmsg();
      put _msg_=;
     end;
     else do;
      if (dsid > 0) then
          put "Successfully opened WORK.ONE in" 
              "UTILITY mode.";
     end;

   put "Create a check integrity constraint named teen.";
   rc = iccreate(dsid, 'teen', 'check', 
   '(age > 12) && (age < 20)');

   if (rc > 0) then
      do;
       put rc=;
       _msg_=sysmsg();
       put _msg_=;
     end;
     else do;
      put "Successfully created a check" 
          "integrity constraint.";
     end;

   put "Create a not-null integrity constraint named nn.";
   rc = iccreate(dsid, 'nn', 'not-null', 'age');

   if (rc > 0) then
      do;
       put rc=;
       _msg_=sysmsg();
       put _msg_=;
     end;
     else do;
      put "Successfully created a not-null" 
          "integrity constraint.";
     end;

   put "Create a unique integrity constraint named uq.";
   rc = iccreate(dsid, 'uq', 'unique', 'age');

   if (rc > 0) then
      do;
       put rc=;
       _msg_=sysmsg();
       put _msg_=;
     end;
     else do;
      put "Successfully created a unique" 
          "integrity constraint.";
     end;

  put "Create a primary key integrity constraint named pk.";
  rc = iccreate(dsid, 'pk', 'Primary', 'name');

   if (rc > 0) then
      do;
       put rc=;
       _msg_=sysmsg();
       put _msg_=;
     end;
     else do;
      put "Successfully created a primary key" 
          "integrity constraint.";
     end;
 
   put "Closing WORK.ONE.";
   rc = close(dsid);
   if (rc > 0) then
     do;
      put rc=;
      _msg_=sysmsg();
      put _msg_=;
     end;

   put "Opening WORK.TWO in utility mode.";
   dsid2 = open('work.two', 'V');  
      /*Utility mode */
   if (dsid2 = 0) then
     do;
    _msg_=sysmsg();
      put _msg_=;
     end;
     else do;
      if (dsid2 > 0) then
        put "Successfully opened WORK.TWO in" 
            "UTILITY mode.";
     end;

   put "Create a foreign key integrity constraint named fk.";
   rc = iccreate(dsid2, 'fk', 'foreign', 'name', 
   'work.one','null', 'restrict');

   if (rc > 0) then
      do;
       put rc=;
       _msg_=sysmsg();
       put _msg_=;
     end;
     else do;
      put "Successfully created a foreign key" 
          "integrity constraint.";
     end;

 put "Closing WORK.TWO.";
   rc = close(dsid2);
   if (rc > 0) then
     do;
      put rc=;
      _msg_=sysmsg();
      put _msg_=;
     end;
return;

TERM:
  put "End of test SCL integrity constraint" 
      "functions.";
return;
The previous code creates the SCL catalog entry. The following code creates two data files, ONE and TWO, and executes the SCL entry EXAMPLE.IC_CAT.ALLICS.SCL:
   /* Submit to create data files. */

data one two;
   input name $ age;
datalines;
Morris 13
Elaine 14
Tina 15
;

   /* after compiling, run the SCL program */

proc display catalog= example.ic_cat.allics.scl;
run;

Removing Integrity Constraints

The following sample program segments remove integrity constraints. In the code that deletes a primary key integrity constraint, note that the foreign key integrity constraint is deleted first.
This program segment deletes integrity constraints using PROC SQL.
proc sql;
   alter table salary
     DROP CONSTRAINT for_key;
   alter table people
       DROP CONSTRAINT gender
       DROP CONSTRAINT _nm0001_
       DROP CONSTRAINT status
       DROP CONSTRAINT prim_key
    ;
quit;
This program segment removes integrity constraints using PROC DATASETS.
proc datasets nolist;
   modify tv_survey;
      ic delete val_max;
      ic delete val_gender;
      ic delete val_age;
run;
quit;
This program segment removes integrity constraints using SCL.
TERM:
   put "Opening WORK.TWO in utility mode.";
   dsid2 = open( 'work.two' , 'V' );  /* Utility mode.     */
   if (dsid2 = 0) then
     do;
     _msg_=sysmsg();
     put _msg_=;
   end;
   else do;
     if (dsid2 > 0) then
        put "Successfully opened WORK.TWO in Utility mode.";
    end;

rc = icdelete(dsid2, 'fk');
if (rc > 0) then
  do;
   put rc=;
   _msg_=sysmsg();
  end;
else
   do;
   put "Successfully deleted a foreign key integrity constraint.";
   end;
rc = close(dsid2);
return;

Reactivating an Inactive Integrity Constraint

The following program segment reactivates a foreign key integrity constraint that has been inactivated as a result of a COPY, CPORT, CIMPORT, UPLOAD, or DOWNLOAD procedure.
proc datasets;
   modify SAS-data-set;
      ic reactivate fkname references
libref;
   run;
quit;

Defining Overlapping Primary Key and Foreign Key Constraints

The following code illustrates defining overlapping primary key and foreign key constraints:
data Singers1;
   input FirstName $ LastName $ Age;
   datalines;
Tom Jones 62
Kris Kristofferson 66
Willie Nelson 69
Barbra Streisand 60
Paul McCartney 60
Randy Travis 43
;
data Singers2;
   input FirstName $ LastName $ Style $;
   datalines;
Tom Jones Rock   
Kris Kristofferson Country 
Willie Nelson Country 
Barbra Streisand Contemporary 
Paul McCartney Rock 
Randy Travis Country 
;
proc datasets library=work nolist;
   modify Singers1;                                                             
      ic create primary key (FirstName LastName); 1 
   run;  
   modify Singers2;
      ic create foreign key (FirstName LastName) references Singers1 
         on delete restrict on update restrict;  2
   run;                                                                         
   modify Singers2;                                                                             
      ic create primary key (LastName FirstName);  3
   run;    
   modify Singers1;
      ic create foreign key (LastName FirstName) references Singers2 
         on delete restrict on update restrict;  4
   run;   
   
quit;
1 Defines a primary key constraint for data set Singers1, for variables FIRSTNAME and LASTNAME.
2 Defines a foreign key constraint for data set Singers2 for variables FIRSTNAME and LASTNAME that references the primary key defined in Step 1. Because the intention is to define a primary key using the same variables, the foreign key update and delete referential actions must both be RESTRICT.
3 Defines a primary key constraint for data set Singers2 for variables LASTNAME and FIRSTNAME. Because those exact same variables are already defined as a foreign key, the order must be different.
4 Defines a foreign key constraint for data set Singers1 for variables LASTNAME and FIRSTNAME that references the primary key defined in Step 3. Because those exact same variables are already defined as a primary key, the order must be different. Because a primary key is already defined using the same variables, the foreign key's update and delete referential actions must both be RESTRICT.