A. Reference
- Collinearity Diagnostics, Model Fit & Variable Contribution
- UCLA: Institute for Digital Research and Education
B. Purpose
- Examine whether predictors are highly collinear which can casuse problems in estimating the regression coefficients.
- As the degree of multicollinearity increases, the coefficient estimates become unstable and the standard errors for the coefficients can be wildly inflated.
C. SAS code
proc reg data = cars; model msrp = enginesize cylinders horsepower / tol vif collinoint; run; quit;
D. Notes
- proc reg can not deal with categorical variable directly, therefore you need to create dummy variable yourself for the categorical variable.
- tol: tolerance, the percent of variance in the predictor that cannot be accounted for by other predictors. Regress the predictor variable on the rest of the predictor variable and compute the R square. 1 minus the R square equals tolerance for the predictor.
- vif: variance inflation factor. It is the inverse function of tolerance. Measures how much the variance of the estimated regression coefficient is “inflated” by the existence of correlation among the predictor variables in the model. A vif of 1 means no inflation at all. Exceeding 4 warrants further investigation Greater than 10 vif means serious multicollinearity and requires correction.
- collinoint: produce intercept adjusted collinearity diagnostic. This table decomposes the correlation matrix in to linear combination of variables. The variance of each of these linear combinations is called an eigenvalue. Collinearity is assumed by finding 2 or more variables that have large proportions of variance (.50 or more) that correspond to large condition indices. A large condition index, 10 or more is and indication of instability.
E. SAS Output
F. Interpretation
- Engine Size and cylinders have greater than 5 VIF.
- The higher condition index is 5.41 with 83.7% and 90.1% of variances from for Engine Size and Cylinders. Since 5.4 is less than 10, therefore there is no multicollinearity.
- Total eigenvalue accumulates to 3 because there are 3 predictors.