Body mass index (BMI) is defined as the ratio of weight (kg) to squared height (m) and is a widely used measure for categorizing individuals as overweight or underweight. The percentiles of BMI for specified ages are of particular interest. As age increases, these percentiles provide growth patterns of BMI not only for the majority of the population, but also for underweight or overweight extremes of the population. In addition, the percentiles of BMI for a specified age provide a reference for individuals at that age with respect to the population.
Smooth quantile curves have been widely used for reference charts in medical diagnosis to identify unusual subjects, whose measurements lie in the tails of the reference distribution. This example explains how to use the QUANTREG procedure to create growth charts for BMI.
A SAS data set named bmimen
was created by merging and cleaning the 1999–2000 and 2001–2002 survey results for men that is published by the National
Center for Health Statistics. This data set contains the variables Weight
(kg), Height
(m), BMI
(kg/), Age
(year), and SeQN
(respondent sequence number) for 8,250 men (Chen, 2005).
The data set that is used in this example is a subset of the original data set of Chen (2005). It contains the two variables BMI
and Age
with 3,264 observations.
data bmimen; input BMI Age @@; SqrtAge = sqrt(Age); InveAge = 1/Age; LogBMI = log(BMI); datalines; 18.6 2.0 17.1 2.0 19.0 2.0 16.8 2.0 19.0 2.1 15.5 2.1 16.7 2.1 16.1 2.1 18.0 2.1 17.8 2.1 18.3 2.1 16.9 2.1 15.9 2.1 20.6 2.1 16.7 2.1 15.4 2.1 15.9 2.1 17.7 2.1 ... more lines ... 29.0 80.0 24.1 80.0 26.6 80.0 24.2 80.0 22.7 80.0 28.4 80.0 26.3 80.0 25.6 80.0 24.8 80.0 28.6 80.0 25.7 80.0 25.8 80.0 22.5 80.0 25.1 80.0 27.0 80.0 27.9 80.0 28.5 80.0 21.7 80.0 33.5 80.0 26.1 80.0 28.4 80.0 22.7 80.0 28.0 80.0 42.7 80.0 ;
The logarithm of BMI
is used as the response. (Although this does not improve the quantile regression fit, it helps with statistical inference.)
A preliminary median regression is fitted with a parametric model, which involves six powers of Age
.
The following statements invoke the QUANTREG procedure:
proc quantreg data=bmimen algorithm=interior(tolerance=1e-5) ci=resampling; model logbmi = inveage sqrtage age sqrtage*age age*age age*age*age / diagnostics cutoff=4.5 quantile=.5 seed=1268; id age bmi; test_age_cubic: test age*age*age / wald lr rankscore(tau); run;
The MODEL statement provides the model, and the option QUANTILE=0.5 requests median regression. The ALGORITHM= option requests that the interior point algorithm be used to compute . For more information about this algorithm, see the section Interior Point Algorithm.
Figure 83.11 displays the estimated parameters, standard errors, 95% confidence intervals, t values, and p-values that are computed by the resampling method, which is requested by the CI= option. All of the parameters are considered significant because the p-values are smaller than 0.001.
Figure 83.11: Parameter Estimates with Median Regression: Men
Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
95% Confidence Limits | t Value | Pr > |t| | |
Intercept | 1 | 7.8909 | 0.8168 | 6.2895 | 9.4924 | 9.66 | <.0001 |
InveAge | 1 | -1.8354 | 0.4350 | -2.6884 | -0.9824 | -4.22 | <.0001 |
SqrtAge | 1 | -5.1247 | 0.7135 | -6.5237 | -3.7257 | -7.18 | <.0001 |
Age | 1 | 1.9759 | 0.2537 | 1.4785 | 2.4733 | 7.79 | <.0001 |
SqrtAge*Age | 1 | -0.3347 | 0.0424 | -0.4179 | -0.2515 | -7.89 | <.0001 |
Age*Age | 1 | 0.0227 | 0.0029 | 0.0170 | 0.0284 | 7.77 | <.0001 |
Age*Age*Age | 1 | -0.0000 | 0.0000 | -0.0001 | -0.0000 | -7.40 | <.0001 |
The TEST statement requests Wald, likelihood ratio, and rank tests for the significance of the cubic term in Age
. The test results, shown in Figure 83.12, indicate that this term is significant. Higher-order terms are not significant.
Median regression and, more generally, quantile regression are robust to extremes of the response variable. The DIAGNOSTICS option in the MODEL statement requests a diagnostic table of outliers, shown in Figure 83.13, which uses a cutoff value that is specified in the CUTOFF= option. The variables that are specified in the ID statement are included in the table.
With CUTOFF=4.5, 14 men are identified as outliers. All of these men have large positive standardized residuals, which indicates that they are overweight for their age. The cutoff value 4.5 is ad hoc. It corresponds to a probability less than 0.5E–5 if normality is assumed, but the standardized residuals for median regression usually do not meet this assumption.
In order to construct the chart shown in Figure 83.2, the same model that is used for median regression is used for other quantiles. The QUANTREG procedure can compute fitted values for multiple quantiles.
Figure 83.13: Diagnostics with Median Regression
Diagnostics | ||||
---|---|---|---|---|
Obs | Age | BMI | Standardized Residual |
Outlier |
1337 | 8.900000 | 36.500000 | 5.3575 | * |
1376 | 9.200000 | 39.600000 | 5.8723 | * |
1428 | 9.400000 | 36.900000 | 5.3036 | * |
1505 | 9.900000 | 35.500000 | 4.8862 | * |
1764 | 14.900000 | 46.800000 | 5.6403 | * |
1838 | 16.200000 | 50.400000 | 5.9138 | * |
1845 | 16.300000 | 42.600000 | 4.6683 | * |
1870 | 16.700000 | 42.600000 | 4.5930 | * |
1957 | 18.100000 | 49.900000 | 5.5053 | * |
2002 | 18.700000 | 52.700000 | 5.8106 | * |
2016 | 18.900000 | 48.400000 | 5.1603 | * |
2264 | 32.000000 | 55.600000 | 5.3085 | * |
2291 | 35.000000 | 60.900000 | 5.9406 | * |
2732 | 66.000000 | 14.900000 | -4.7849 | * |
The following statements request fitted values for 10 quantile levels that range from 0.03 to 0.97:
proc quantreg data=bmimen algorithm=interior(tolerance=1e-5) ci=none; model logbmi = inveage sqrtage age sqrtage*age age*age age*age*age / quantile=0.03,0.05,0.1,0.25,0.5,0.75, 0.85,0.90,0.95,0.97; output out=outp pred=p/columnwise; run; data outbmi; set outp; pbmi = exp(p); run; proc sgplot data=outbmi; title 'BMI Percentiles for Men: 2-80 Years Old'; yaxis label='BMI (kg/m**2)' min=10 max=45 values=(10 15 20 25 30 35 40 45); xaxis label='Age (Years)' min=2 max=80 values=(2 10 20 30 40 50 60 70 80); scatter x=age y=bmi /markerattrs=(size=1); series x=age y=pbmi/group=QUANTILE; run;
The fitted values are stored in the OUTPUT data set outp
. The COLUMNWISE option arranges these fitted values for all quantiles in the single variable p
by groups of the quantiles. After the exponential transformation, both the fitted BMI values and the original BMI values
are plotted against age to create the display shown in Figure 83.2.
The fitted quantile curves reveal important information. During the quick growth period (ages 2 to 20), the dispersion of BMI increases dramatically. It becomes stable during middle age, and then it contracts after age 60. This pattern suggests that effective population weight control should start in childhood.
Compared to the 97th percentile in reference growth charts that were published by the Centers for Disease Control and Prevention (CDC) in 2000 (Kuczmarski, Ogden, and Guo, 2002), the 97th percentile for 10-year-old boys in Figure 83.2 is 6.4 BMI units higher (an increase of 27%). This can be interpreted as a warning of overweight or obesity. See Chen (2005) for a detailed analysis.