The QUANTREG Procedure

Growth Charts for Body Mass Index

Body mass index (BMI) is defined as the ratio of weight (kg) to squared height (m$^2$) and is a widely used measure for categorizing individuals as overweight or underweight. The percentiles of BMI for specified ages are of particular interest. As age increases, these percentiles provide growth patterns of BMI not only for the majority of the population, but also for underweight or overweight extremes of the population. In addition, the percentiles of BMI for a specified age provide a reference for individuals at that age with respect to the population.

Smooth quantile curves have been widely used for reference charts in medical diagnosis to identify unusual subjects, whose measurements lie in the tails of the reference distribution. This example explains how to use the QUANTREG procedure to create growth charts for BMI.

A SAS data set named bmimen was created by merging and cleaning the 1999–2000 and 2001–2002 survey results for men published by the National Center for Health Statistics. This data set contains the variables Weight (kg), Height (m), BMI (kg/${\mbox m}^2$), Age (year), and SeQN (respondent sequence number) for 8,250 men. More details can be found in Chen (2005).

The data set used in this example is a subset of the original data set of Chen (2005). It contains the two variables BMI and Age with 3264 observations.

data bmimen;
   input BMI Age @@;
   SqrtAge = sqrt(Age);
   InveAge = 1/Age;
   LogBMI  = log(BMI);
   datalines;
18.6  2.0 17.1  2.0 19.0  2.0 16.8  2.0 19.0  2.1  15.5   2.1
16.7  2.1 16.1  2.1 18.0  2.1 17.8  2.1 18.3  2.1  16.9   2.1
15.9  2.1 20.6  2.1 16.7  2.1 15.4  2.1 15.9  2.1  17.7   2.1

   ... more lines ...   

29.0 80.0 24.1 80.0 26.6 80.0 24.2 80.0 22.7 80.0  28.4  80.0
26.3 80.0 25.6 80.0 24.8 80.0 28.6 80.0 25.7 80.0  25.8  80.0
22.5 80.0 25.1 80.0 27.0 80.0 27.9 80.0 28.5 80.0  21.7  80.0
33.5 80.0 26.1 80.0 28.4 80.0 22.7 80.0 28.0 80.0  42.7  80.0
;

The logarithm of BMI is used as the response (although this does not improve the quantile regression fit, it helps with statistical inference.) A preliminary median regression is fitted with a parametric model, which involves six powers of Age.

The following statements invoke the QUANTREG procedure:

proc quantreg data=bmimen algorithm=interior(tolerance=1e-5) ci=resampling;
   model logbmi = inveage sqrtage age sqrtage*age
                  age*age age*age*age
                  / diagnostics cutoff=4.5 quantile=.5 seed=1268;
   id age bmi;
   test_age_cubic: test age*age*age / wald lr rankscore(tau);
run;

The MODEL statement provides the model, and the option QUANTILE=0.5 requests median regression, which computes $\hat\bbeta ({\frac12})$ by using the interior point algorithm as requested with the ALGORITHM= option. See the section Interior Point Algorithm for details about this algorithm.

Figure 77.11 displays the estimated parameters, standard errors, 95% confidence intervals, t values, and p-values that are computed by the resampling method as requested by the CI= option. All of the parameters are considered significant since the p-values are smaller than 0.001.

Figure 77.11: Parameter Estimates with Median Regression: Men

The QUANTREG Procedure

Parameter Estimates
Parameter DF Estimate Standard Error 95% Confidence Limits t Value Pr > |t|
Intercept 1 7.8909 0.8168 6.2895 9.4924 9.66 <.0001
InveAge 1 -1.8354 0.4350 -2.6884 -0.9824 -4.22 <.0001
SqrtAge 1 -5.1247 0.7135 -6.5237 -3.7257 -7.18 <.0001
Age 1 1.9759 0.2537 1.4785 2.4733 7.79 <.0001
SqrtAge*Age 1 -0.3347 0.0424 -0.4179 -0.2515 -7.89 <.0001
Age*Age 1 0.0227 0.0029 0.0170 0.0284 7.77 <.0001
Age*Age*Age 1 -0.0000 0.0000 -0.0001 -0.0000 -7.40 <.0001


The TEST statement requests Wald, likelihood ratio, and rank tests for the significance of the cubic term in Age. The test results, shown in Figure 77.12, indicate that this term is significant. Higher-order terms are not significant.

Figure 77.12: Test of Significance for Cubic Term

Test test_age_cubic Results
Test Test Statistic DF Chi-Square Pr > ChiSq
Wald 54.7417 1 54.74 <.0001
Likelihood Ratio 56.9473 1 56.95 <.0001
Rank_Tau 42.5731 1 42.57 <.0001


Median regression and, more generally, quantile regression are robust to extremes of the response variable. The DIAGNOSTICS option in the MODEL statement requests a diagnostic table of outliers, shown in Figure 77.13, which uses a cutoff value specified with the CUTOFF= option. The variables specified in the ID statement are included in the table.

With CUTOFF=4.5, 14 men are identified as outliers. All of these men have large positive standardized residuals, which indicates that they are overweight for their age. The cutoff value 4.5 is ad hoc; it corresponds to a probability less than 0.5E–5 if normality is assumed, but the standardized residuals for median regression usually do not meet this assumption.

In order to construct the chart shown in Figure 77.2, the same model used for median regression is used for other quantiles. Note that the QUANTREG procedure can compute fitted values for multiple quantiles.

Figure 77.13: Diagnostics with Median Regression

Diagnostics
Obs Age BMI Standardized
Residual
Outlier
1337 8.900000 36.500000 5.3575 *
1376 9.200000 39.600000 5.8723 *
1428 9.400000 36.900000 5.3036 *
1505 9.900000 35.500000 4.8862 *
1764 14.900000 46.800000 5.6403 *
1838 16.200000 50.400000 5.9138 *
1845 16.300000 42.600000 4.6683 *
1870 16.700000 42.600000 4.5930 *
1957 18.100000 49.900000 5.5053 *
2002 18.700000 52.700000 5.8106 *
2016 18.900000 48.400000 5.1603 *
2264 32.000000 55.600000 5.3085 *
2291 35.000000 60.900000 5.9406 *
2732 66.000000 14.900000 -4.7849 *


The following statements request fitted values for 10 quantile levels ranging from 0.03 to 0.97:

proc quantreg data=bmimen algorithm=interior(tolerance=1e-5) ci=none;
   model logbmi = inveage sqrtage age sqrtage*age
                    age*age age*age*age
                    / quantile=0.03,0.05,0.1,0.25,0.5,0.75,
                               0.85,0.90,0.95,0.97;
   output out=outp pred=p/columnwise;
run;

data outbmi;
   set outp;
   pbmi = exp(p);
run;

proc sgplot data=outbmi;
   title 'BMI Percentiles for Men: 2-80 Years Old';
   yaxis label='BMI (kg/m**2)' min=10 max=45 values=(10 15 20 25 30 35 40 45);
   xaxis label='Age (Years)' min=2 max=80 values=(2 10 20 30 40 50 60 70 80);

   scatter x=age y=bmi /markerattrs=(size=1);
   series  x=age y=pbmi/group=QUANTILE;
run;

The fitted values are stored in the OUTPUT data set outp. The COLUMNWISE option arranges these fitted values for all quantiles in the single variable p by groups of the quantiles. After the exponential transformation, the fitted BMI values together with the original BMI values are plotted against age to create the display shown in Figure 77.2.

The fitted quantile curves reveal important information. During the quick growth period (ages 2 to 20), the dispersion of BMI increases dramatically; it becomes stable during middle age, and then it contracts after age 60. This pattern suggests that effective population weight control should start in childhood.

Compared to the 97th percentile in reference growth charts published by CDC in 2000 (Kuczmarski, Ogden, and Guo, 2002), the 97th percentile for 10-year-old boys in Figure 77.2 is 6.4 BMI units higher (an increase of 27%). This can be interpreted as a warning of overweight or obesity. See Chen (2005) for a detailed analysis.