The SURVEYFREQ Procedure

Odds Ratio and Relative Risks

The OR option provides estimates of the odds ratio, the column 1 relative risk, and the column 2 relative risk for $2 \times 2$ tables, together with their confidence limits.

Odds Ratio

For a $2 \times 2$ table, the odds of a positive (column 1) response in row 1 is $N_{11} / N_{12}$ . Similarly, the odds of a positive response in row 2 is $N_{21} / N_{22}$ . The odds ratio is formed as the ratio of the row 1 odds to the row 2 odds. The estimate of the odds ratio is computed as

$\widehat{\mathit{OR}} = \frac{\widehat{N}_{11} ~ / ~ \widehat{N}_{12}}{\widehat{N}_{21} ~ / ~ \widehat{N}_{22}} = \frac{\widehat{N}_{11} ~ \widehat{N}_{22}}{\widehat{N}_{12} ~ \widehat{N}_{21}}$

The value of the odds ratio can be any nonnegative number. When the row and column variables are independent, the true value of the odds ratio equals 1. An odds ratio greater than 1 indicates that the odds of a positive response are higher in row 1 than in row 2. An odds ratio less than 1 indicates that the odds of positive response are higher in row 2. The strength of association increases with the deviation from 1. For more information, see Stokes, Davis, and Koch (2000) and Agresti (2007).

PROC SURVEYFREQ constructs confidence limits for the odds ratio by using the log transform. The $100(1 - \alpha )$ % confidence limits for the odds ratio are computed as

$\left( ~ \widehat{\mathit{OR}} \times \exp ( -t_{\mi{df}, \alpha /2} ~ \sqrt {v} ), ~ \widehat{\mathit{OR}} \times \exp ( t_{\mi{df}, \alpha /2} ~ \sqrt {v} ) ~ \right)$

where

$v = \widehat{\mr{Var}} (\ln \widehat{\mathit{OR}}) = \widehat{\mr{Var}} (\widehat{\mathit{OR}}) ~ / ~ \widehat{\mathit{OR}}^{~ 2}$

is the estimate of the variance of the log odds ratio and $t_{\mi{df}, \alpha /2}$ is the $100(1-\alpha /2)$ percentile of the t distribution with df degrees of freedom. (For more information, see the section Degrees of Freedom.) The value of the confidence coefficient $\alpha$ is determined by the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.

If you request BRR variance estimation (VARMETHOD=BRR ), PROC SURVEYFREQ estimates the variance of the odds ratio as described in the section Balanced Repeated Replication (BRR). If you request jackknife variance estimation (VARMETHOD=JACKKNIFE ), the procedure estimates the variance as described in the section The Jackknife Method.

If you do not specify the VARMETHOD= option or a REPWEIGHTS statement, the default variance estimation method is Taylor series (VARMETHOD=TAYLOR ). By using Taylor series linearization, the variance estimate for the odds ratio can be expressed as

$\widehat{\mr{Var}}(\widehat{\mathit{OR}}) = \widehat{\mb{D}} ~ \widehat{\mb{V}}(\widehat{\mb{N}}) ~ \widehat{\mb{D}}’$

where $\widehat{\mb{V}}(\widehat{\mb{N}})$ is the covariance matrix of the estimates of the cell totals $\widehat{\mb{N}}$ ,

$\widehat{\mb{N}} = \left( ~ \widehat{N}_{11}, ~ ~ \widehat{N}_{12}, ~ ~ \widehat{N}_{21}, ~ ~ \widehat{N}_{22} ~ \right)$

and $\widehat{\mb{D}}$ is an array that contains the partial derivatives of the odds ratio with respect to the elements of $\widehat{\mb{N}}$ . The section Covariances of Frequency Estimates describes the computation of $\widehat{\mb{V}}(\widehat{\mb{N}})$ . The array $\widehat{\mb{D}}$ is computed as

$\widehat{\mb{D}} = \left( ~ \widehat{N}_{22} / ( \widehat{N}_{12} \widehat{N}_{21} ), ~ ~ ~ ~ - \widehat{N}_{11} \widehat{N}_{22} / ( \widehat{N}_{21} \widehat{N}_{12}^{~ 2} ), ~ ~ ~ - \widehat{N}_{11} \widehat{N}_{22} / ( \widehat{N}_{12} \widehat{N}_{21}^{~ 2} ), ~ ~ ~ \widehat{N}_{11} / ( \widehat{N}_{12} \widehat{N}_{21} ) ~ \right)$

For more information, see Wolter (1985, pp. 239–242).

Relative Risks

For a $2 \times 2$ table, the column 1 relative risk is the ratio of the column 1 risks for row 1 to row 2. As described in the section Risks and Risk Difference, the column 1 risk for row 1 is the proportion of row 1 observations classified in column 1, and the column 1 risk for row 2 is the proportion of row 2 observations classified in column 1. The estimate of the column 1 relative risk is computed as

$\widehat{\mathit{RR}}_1 = \frac{\widehat{N}_{11} ~ / ~ \widehat{N}_{1 \cdot }}{\widehat{N}_{21} ~ / ~ \widehat{N}_{2 \cdot }}$

Similarly, the estimate of the column 2 relative risk is computed as

$\widehat{\mathit{RR}}_2 = \frac{\widehat{N}_{12} ~ / ~ \widehat{N}_{1 \cdot }}{\widehat{N}_{22} ~ / ~ \widehat{N}_{2 \cdot }}$

A relative risk greater than 1 indicates that the probability of positive response is greater in row 1 than in row 2. Similarly, a relative risk less than 1 indicates that the probability of positive response is less in row 1 than in row 2. The strength of association increases with the deviation from 1. For more information, see Stokes, Davis, and Koch (2000) and Agresti (2007).

PROC SURVEYFREQ constructs confidence limits for the relative risk by using the log transform, which is similar to the odds ratio computations described previously. The $100(1 - \alpha )$ % confidence limits for the column 1 relative risk are computed as

$\left( ~ \widehat{\mathit{RR}}_1 \times \exp ( -t_{\mi{df}, \alpha /2} ~ \sqrt {v} ), ~ \widehat{\mathit{RR}}_1 \times \exp ( t_{\mi{df}, \alpha /2} ~ \sqrt {v} ) ~ \right)$

where

$v = \widehat{\mr{Var}} (\ln \widehat{\mathit{RR}}_1) = \widehat{\mr{Var}} (\widehat{\mathit{RR}}_1) ~ / ~ \widehat{\mathit{RR}}_1^{~ 2}$

is the estimate of the variance of the log column 1 relative risk and $t_{\mi{df}, \alpha /2}$ is the $100(1-\alpha /2)$ percentile of the t distribution with df degrees of freedom. (For more information, see the section Degrees of Freedom.) The value of the confidence coefficient $\alpha$ is determined by the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.

If you request BRR variance estimation (VARMETHOD=BRR ), PROC SURVEYFREQ estimates the variance of the column 1 relative risk as described in the section Balanced Repeated Replication (BRR). If you request jackknife variance estimation (VARMETHOD=JACKKNIFE ), the procedure estimates the variance as described in the section The Jackknife Method.

$\widehat{\mr{Var}}(\widehat{\mathit{RR}}_1) = \widehat{\mb{D}} ~ \widehat{\mb{V}}(\widehat{\mb{X}}) ~ \widehat{\mb{D}}’$

where $\widehat{\mb{V}}(\widehat{\mb{X}})$ is the covariance matrix of $\widehat{\mb{X}}$ ,

$\widehat{\mb{X}} = \left( ~ \widehat{N}_{11}, ~ ~ \widehat{N}_{1 \cdot }, ~ ~ \widehat{N}_{21}, ~ ~ \widehat{N}_{2 \cdot } ~ \right)$

and $\widehat{\mb{D}}$ is an array that contains the partial derivatives of the column 1 relative risk with respect to the elements of $\widehat{\mb{X}}$ ,

$\widehat{\mb{D}} = \left( ~ \widehat{N}_{2 \cdot } / ( \widehat{N}_{21} \widehat{N}_{1 \cdot } ), ~ ~ ~ ~ - \widehat{N}_{11} \widehat{N}_{2 \cdot } / ( \widehat{N}_{21} \widehat{N}_{1 \cdot }^{~ 2} ), ~ ~ ~ - \widehat{N}_{11} \widehat{N}_{2 \cdot } / ( \widehat{N}_{1 \cdot } \widehat{N}_{21}^{~ 2} ), ~ ~ ~ \widehat{N}_{11} / ( \widehat{N}_{21} \widehat{N}_{1 \cdot } ) ~ \right)$

For more information, see Wolter (1985, pp. 239–242).

Confidence limits for the column 2 relative risk are computed similarly.