TY - JOUR
T1 - Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter
AU - Pazira, Hassan
AU - Augugliaro, Luigi
AU - Wit, Ernst
PY - 2018/7
Y1 - 2018/7
N2 - A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this manuscript is to extend the differential geometric least angle regression method for high-dimensional GLMs to arbitrary exponential dispersion family distributions with arbitrary link functions. This entails, first, extending the predictor-corrector (PC) algorithm to arbitrary distributions and link functions, and second, proposing an efficient estimator of the dispersion parameter. Furthermore, improvements to the computational algorithm lead to an important speed-up of the PC algorithm. Simulations provide supportive evidence concerning the proposed efficient algorithms for estimating coefficients and dispersion parameter. The resulting method has been implemented in our R package (which will be merged with the original dglars package) and is shown to be an effective method for inference for arbitrary classes of GLMs.
AB - A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this manuscript is to extend the differential geometric least angle regression method for high-dimensional GLMs to arbitrary exponential dispersion family distributions with arbitrary link functions. This entails, first, extending the predictor-corrector (PC) algorithm to arbitrary distributions and link functions, and second, proposing an efficient estimator of the dispersion parameter. Furthermore, improvements to the computational algorithm lead to an important speed-up of the PC algorithm. Simulations provide supportive evidence concerning the proposed efficient algorithms for estimating coefficients and dispersion parameter. The resulting method has been implemented in our R package (which will be merged with the original dglars package) and is shown to be an effective method for inference for arbitrary classes of GLMs.
KW - High-dimensional inference
KW - Generalized linear models
KW - Least angle regression
KW - Predictor-corrector algorithm
KW - Dispersion paremeter
KW - LEAST ANGLE REGRESSION
KW - LINEAR-MODELS
KW - VARIABLE SELECTION
KW - CROSS-VALIDATION
KW - DANTZIG SELECTOR
KW - SHRINKAGE
U2 - 10.1007/s11222-017-9761-7
DO - 10.1007/s11222-017-9761-7
M3 - Article
VL - 28
SP - 753
EP - 774
JO - Statistics and Computing
JF - Statistics and Computing
SN - 0960-3174
IS - 4
ER -