You can obtain a highly accurate power estimate by simulating the power empirically. You need to use this approach for analyses that are not supported directly in SAS/STAT tools and for which you lack a power formula. But the simulation approach is also a viable alternative to existing power approximations. A high number of simulations will yield a more accurate estimate than a non-exact power approximation.
Although exact power computations for the two-sample t test are supported in several of the SAS/STAT tools, suppose for purposes of illustration that you want to simulate power for the continuing t test example. This section describes how you can use the DATA step and SAS/STAT software to do this.
The simulation involves generating a large number of data sets according to the distributions defined by the power analysis input parameters, computing the relevant p-value for each data set, and then estimating the power as the proportion of times that the p-value is significant.
The following statements compute a power estimate along with a 95% confidence interval for power for the first scenario in the two-sample t test example, with 10,000 simulations:
%let meandiff = 5; %let stddev = 12; %let alpha = 0.05; %let ntotal = 100; %let nsim = 10000; data simdata; call streaminit(123); do isim = 1 to ≁ do i = 1 to floor(&ntotal/2); group = 1; y = rand('normal', 0 , &stddev); output; group = 2; y = rand('normal', &meandiff, &stddev); output; end; end; run; ods listing close; proc ttest data=simdata; ods output ttests=tests; by isim; class group; var y; run; ods listing; data tests; set tests; where method="Pooled"; issig = probt < α run;
proc freq data=tests; ods select binomial; tables issig / binomial(level='1'); run;
First the DATA step is used to randomly generate nsim = 10,000 data sets based on the meandiff, stddev, and ntotal parameters and the normal distribution, consistent with the assumptions underlying the two-sample t test. These data sets are contained in a large SAS data set called simdata
indexed by the variable isim
.
The CALL STREAMINIT(123) statement initializes the random number generator with a specific sequence and ensures repeatable results for purposes of this example. ( Note: Skip this step when you are performing actual power simulations.)
The TTEST procedure is run using isim
as a BY variable, with the ODS LISTING CLOSE statement to suppress output. The ODS OUTPUT statement saves the "TTests" table
to a data set called tests
. The p-values are contained in a column called probt.
The subsequent DATA step defines a variable called issig
to flag the significant p-values.
Finally, the FREQ procedure computes the empirical power estimate as the estimate of P(issig
= 1) and provides approximate and exact confidence intervals for this estimate.
Figure 18.7 shows the results. The estimated power is 0.5388 with 95% confidence interval (0.5290, 0.5486). Note that the exact power of 0.541 shown in the first row in Figure 18.1 is contained within this tight confidence interval.