Quantile regression generalizes the concept of a univariate quantile to a conditional quantile given one or more covariates. Recall that a student’s score on a test is at the quantile if his or her score is better than that of of the students who took the test. The score is also said to be at the 100 percentile.
For a random variable Y with probability distribution function
the quantile of Y is defined as the inverse function
where the quantile level ranges between 0 and 1. In particular, the median is .
For a random sample of Y, it is well known that the sample median minimizes the sum of absolute deviations:
Likewise, the general sample quantile , which is the analog of , is formulated as the minimizer
where , , and where denotes the indicator function. The loss function assigns a weight of to positive residuals and a weight of to negative residuals.
Using this loss function, the linear conditional quantile function extends the sample quantile to the regression setting in the same way that the linear conditional mean function extends the sample mean. Recall that OLS regression estimates the linear conditional mean function by solving for
The estimated parameter minimizes the sum of squared residuals in the same way that the sample mean minimizes the sum of squares:
Likewise, quantile regression estimates the linear conditional quantile function, , by solving the following equation for :
The quantity is called the regression quantile. The case (which minimizes the sum of absolute residuals) corresponds to median regression (which is also known as regression).
The following set of regression quantiles is referred to as the quantile process:
The QUANTREG procedure computes the quantile function and conducts statistical inference on the estimated parameters .