The POWER Procedure

Analyses in the LOGISTIC Statement

Likelihood Ratio Chi-Square Test for One Predictor (TEST=LRCHI)

The power computing formula is based on Shieh and O’Brien (1998); Shieh (2000); Self, Mauritsen, and Ohara (1992), and Hsieh (1989).

Define the following notation for a logistic regression analysis:

\begin{align*}  N & = \# \text {subjects}\quad (\text {NTOTAL}) \\ K & = \# \text {predictors (not counting intercept)} \\ \mb{x} & = (x_{1}, \ldots , x_{K})’ = \text {random variables for predictor vector} \\ \mb{x}_{-1} & = (x_{2}, \ldots , x_{K})’ \\ \bmu & = (\mu _{1}, \ldots , \mu _{K})’ = \mr{E}\mb{x} = \text {mean predictor vector} \\ \mb{x}_ i & = (x_{i1}, \ldots , x_{iK})’ = \text {predictor vector for subject } i \quad (i \in 1, \ldots , N) \\ Y & = \text {random variable for response (0 or 1)} \\ Y_ i & = \text {response for subject } i \quad (i \in 1, \ldots , N) \\ p_ i & = \mr{Prob} (Y_ i = 1 | \mb{x}_ i) \quad (i \in 1, \ldots , N) \\ \phi & = \mr{Prob} (Y_ i = 1 | \mb{x}_ i = \bmu ) \quad (\text {RESPONSEPROB}) \\ U_ j & = \text {unit change for }j\text {th predictor} \quad (\text {UNITS})\\ \mr{OR}_ j & = \mr{Odds} (Y_ i = 1 | x_{ij} = c) / \mr{Odds} (Y_ i = 1 | x_{ij} = c - U_ j) \quad (c \text { arbitrary}, i \in 1, \ldots , N, \\ &  \quad j \in 1, \ldots , K)\quad \text {(TESTODDSRATIO if }j = 1, \text {COVODDSRATIOS if }j > 1) \\ \Psi _0 & = \text {intercept in full model (INTERCEPT)} \\ \bPsi & = (\Psi _1, \ldots , \Psi _ K)’ = \text {regression coefficients in full model} \\ &  \quad (\Psi _1 = \text {TESTREGCOEFF, others = COVREGCOEFFS}) \\ \rho & = \mr{Corr}(\mb{x}_{-1}, x_1) \quad (\text {CORR}) \\ c_ j & = \# \text {distinct possible values of } x_{ij} \quad (j \in 1,\ldots , K) (\text {for any }i) \quad (\text {NBINS}) \\ x^\star _{gj}& = g\text {th possible value of } x_{ij} \quad (g \in 1, \ldots , c_ j) (j \in 1, \ldots , K) \\ &  \quad (\mbox{for any }i) \quad (\mbox{VARDIST}) \\ \end{align*}
\begin{align*}  \pi _{gj} & = \mr{Prob} \left( x_{ij} = x^\star _{gj} \right) \quad (g \in 1, \ldots , c_ j) (j \in 1, \ldots , K) \\ &  \quad (\text {for any }i) \quad (\text {VARDIST}) \\ C & = \prod _{j=1}^{K} c_ j = \# \text {possible values of }\mb{x}_ i \quad (\text {for any }i) \\ \mb{x}^\star _ m & = m\text {th possible value of }\mb{x}_ i\quad (m \in 1, \ldots , C) \\ \pi _ m & = \mr{Prob}\left(\mb{x}_ i = \mb{x}^\star _ m \right)\quad (m \in 1, \ldots , C) \end{align*}

The logistic regression model is

\[  \log \left( \frac{p_ i}{1-p_ i} \right) = \Psi _0 + \bPsi ’\mb{x}_ i  \]

The hypothesis test of the first predictor variable is

\begin{align*}  H_{0}\colon & \Psi _1 = 0 \\ H_{1}\colon & \Psi _1 \ne 0 \end{align*}

Assuming independence among all predictor variables, $\pi _ m$ is defined as follows:

\[  \pi _ m = \prod _{j=1}^{K} \pi _{h(m,j) j} \quad (m \in 1, \ldots , C)  \]

where $h(m,j)$ is calculated according to the following algorithm:

\begin{align*}  \lefteqn{z = m;} \\ \lefteqn{\mr{do} \quad j = K \quad \mr{to} \quad 1;} \\ \lefteqn{\quad h(m,j) = \mr{mod}(z-1, c_ j) + 1;} \\ \lefteqn{\quad z = \mr{floor}((z-1) / c_ j) + 1;} \\ \lefteqn{\mr{end};} \\ \end{align*}

This algorithm causes the elements of the transposed vector $\{ h(m,1), \ldots , h(m,K) \} $ to vary fastest to slowest from right to left as m increases, as shown in the following table of $h(m,j)$ values:

\[  \begin{array}{cc|ccccc}& & & &  j & & \\ h(m,j) & &  1 &  2 &  \cdots &  K-1 &  K \\ \hline &  1 &  1 &  1 &  \cdots &  1 &  1 \\ &  1 &  1 &  1 &  \cdots &  1 &  2 \\ &  \vdots & & &  \vdots & & \\ &  \vdots &  1 &  1 &  \cdots &  1 &  c_ K \\ &  \vdots &  1 &  1 &  \cdots &  2 &  1 \\ &  \vdots &  1 &  1 &  \cdots &  2 &  2 \\ &  \vdots & & &  \vdots & & \\ m &  \vdots &  1 &  1 &  \cdots &  2 &  c_ K \\ &  \vdots & & &  \vdots & & \\ &  \vdots &  c_1 &  c_2 &  \cdots &  c_{K-1} &  1 \\ &  \vdots &  c_1 &  c_2 &  \cdots &  c_{K-1} &  2 \\ &  \vdots & & &  \vdots & & \\ &  C &  c_1 &  c_2 &  \cdots &  c_{K-1} &  c_ K \\ \end{array}  \]

The $\mb{x}^\star _ m$ values are determined in a completely analogous manner.

The discretization is handled as follows (unless the distribution is ordinal, or binomial with sample size parameter at least as large as requested number of bins): for $x_ j$, generate $c_ j$ quantiles at evenly spaced probability values such that each such quantile is at the midpoint of a bin with probability $\frac{1}{c_ j}$. In other words,

\begin{align*}  x^\star _{gj} & = \left( \frac{g - 0.5}{c_ j} \right) \text {th quantile of relevant distribution}\\ &  \quad (g \in 1, \ldots , c_ j) (j \in 1, \ldots , K) \\ \pi _{gj} & = \frac{1}{c_ j} \quad \text {(same for all }g \text {)} \end{align*}

The primary noncentrality for the power computation is

\[  \Delta ^\star = 2 \sum _{m=1}^ C \pi _ m \left[ b’(\theta _ m) \left(\theta _ m - \theta ^\star _ m \right) - \left( b(\theta _ m) - b(\theta ^\star _ m) \right) \right]  \]

where

\begin{align*}  b’(\theta ) & = \frac{\exp (\theta )}{1 + \exp (\theta )} \\ b(\theta ) & = \log \left( 1 + \exp (\theta ) \right) \\ \theta _ m & = \Psi _0 + \bPsi ’\mb{x}^\star _ m \\ \theta ^\star _ m & = \Psi ^\star _0 + \bPsi ^{\star \prime } \mb{x}^\star _ m \end{align*}

where

\begin{align*}  \Psi ^\star _0 & = \Psi _0 + \Psi _1 \mu _1 = \mbox{ intercept in reduced model, absorbing the tested predictor}\\ \bPsi ^\star & = (0, \Psi _2, \ldots , \Psi _ K)’ = \mbox{ coefficients in reduced model} \end{align*}

The power is

\[  \mr{power} = P\left(\chi ^2(1, \Delta ^\star N (1-\rho ^2)) \ge \chi ^2_{1-\alpha }(1)\right)  \]

The factor $(1-\rho ^2)$ is the adjustment for correlation between the predictor that is being tested and other predictors, from Hsieh (1989).

Alternative input parameterizations are handled by the following transformations:

\begin{align*}  \Psi _0 & = \log \left( \frac{\phi }{1-\phi } \right) - \bPsi ’\bmu \\ \Psi _ j & = \frac{\log (\mr{OR}_ j)}{U_ j} \quad (j \in 1, \ldots , K) \\ \end{align*}