The eye and hair color of children from two different regions of Europe are recorded in the data set Color
. Instead of recording one observation per child, the data are recorded as cell counts, where the variable Count
contains the number of children exhibiting each of the 15 eye and hair color combinations. The data set does not include
missing combinations.
The following DATA step statements create the SAS data set Color
:
data Color; input Region Eyes $ Hair $ Count @@; label Eyes ='Eye Color' Hair ='Hair Color' Region='Geographic Region'; datalines; 1 blue fair 23 1 blue red 7 1 blue medium 24 1 blue dark 11 1 green fair 19 1 green red 7 1 green medium 18 1 green dark 14 1 brown fair 34 1 brown red 5 1 brown medium 41 1 brown dark 40 1 brown black 3 2 blue fair 46 2 blue red 21 2 blue medium 44 2 blue dark 40 2 blue black 6 2 green fair 50 2 green red 31 2 green medium 37 2 green dark 23 2 brown fair 56 2 brown red 42 2 brown medium 53 2 brown dark 54 2 brown black 13 ;
The following PROC FREQ statements read the Color
data set and create an output data set that contains the frequencies, percentages, and expected cell frequencies of the two-way
table of Eyes
by Hair
. The TABLES statement requests three tables: a frequency table for Eyes
, a frequency table for Hair
, and a crosstabulation table for Eyes
by Hair
. The OUT= option creates the FreqCount
data set, which contains the crosstabulation table frequencies. The OUTEXPECT option outputs the expected table cell frequencies
to FreqCount
, and the SPARSE option includes zero cell frequencies in the output data set. The WEIGHT statement specifies that the variable
Count
contains the observation weights. These statements create Output 40.1.1 through Output 40.1.3.
proc freq data=Color; tables Eyes Hair Eyes*Hair / out=FreqCount outexpect sparse; weight Count; title 'Eye and Hair Color of European Children'; run;
proc print data=FreqCount noobs; title2 'Output Data Set from PROC FREQ'; run;
Output 40.1.1 displays the two frequency tables produced by PROC FREQ: one showing the distribution of eye color, and one showing the distribution of hair color. By default, PROC FREQ lists the variables values in alphabetical order. The 'Eyes*Hair' specification produces a crosstabulation table, shown in Output 40.1.2, with eye color defining the table rows and hair color defining the table columns. A zero cell frequency for green eyes and black hair indicates that this eye and hair color combination does not occur in the data.
The output data set FreqCount
(Output 40.1.3) contains frequency counts and percentages for the last table requested in the TABLES statement, Eyes
by Hair
. Because the SPARSE option is specified, the data set includes the observation with a zero frequency. The variable Expected
contains the expected frequencies, as requested by the OUTEXPECT option.
Output 40.1.2: Crosstabulation Table
|
|
Output 40.1.3: Output Data Set of Frequencies
Eye and Hair Color of European Children |
Output Data Set from PROC FREQ |
Eyes | Hair | COUNT | EXPECTED | PERCENT |
---|---|---|---|---|
blue | black | 6 | 6.409 | 0.7874 |
blue | dark | 51 | 53.024 | 6.6929 |
blue | fair | 69 | 66.425 | 9.0551 |
blue | medium | 68 | 63.220 | 8.9239 |
blue | red | 28 | 32.921 | 3.6745 |
brown | black | 16 | 9.845 | 2.0997 |
brown | dark | 94 | 81.446 | 12.3360 |
brown | fair | 90 | 102.031 | 11.8110 |
brown | medium | 94 | 97.109 | 12.3360 |
brown | red | 47 | 50.568 | 6.1680 |
green | black | 0 | 5.745 | 0.0000 |
green | dark | 37 | 47.530 | 4.8556 |
green | fair | 69 | 59.543 | 9.0551 |
green | medium | 55 | 56.671 | 7.2178 |
green | red | 38 | 29.510 | 4.9869 |