EFIM10014
Quantitative Analysis in Management
Regression Analysis: Model Validation II
Sophie Lythreatis
1QAM Week 10
QAM Week 10 2
Content:
• There are 4 assumptions of the regression model
• 1st assumption of the regression model
• 2nd assumption of the regression model
3
Assumptions of the Regression Model
• The mathematics underlying the regression procedure is based upon a number of assumptions
• If these are not valid, then even though the regression procedure will produce a regression line, it could be totally meaningless as a predictive tool
• We need to ensure the assumptions are valid
• The four main assumptions are:
• Constant error variance (homoscedasticity)
• Normality of residuals
• Independent residuals (no autocorrelation)
• Independence of explanatory/independent variables (no multicollinearity)
QAM Week 10
QAM Week 10 4
1. Constant error variance (homoscedasticity)
When we don’t have homoscedasticity, we have
HETEROSCEDASTICITY
What does this mean?
5
6
QAM Week 10 7
• The errors terms are assumed to be:
Homoscedastic (The same variance at every X)
• This assumption means that the variance of the
residuals is constant for all values of a given
explanatory/independent variable.
• The case of unequal error variances is called
heteroscedasticity (this is a problem!)
Homoscedasticity
8
How do we check for heteroscedasticity?
• The easiest way to check for this is a scatterplot of the residuals against each explanatory/independent variable
• A residual plot is a graph that shows the residuals/errors on the vertical axis and the independent variable on the horizontal axis.
QAM Week 10
QAM Week 10 9
x
Residual plot against x
x
0
Good R
e si
d u
a l
Residual Plot Against x
Residual Plot Against x
x
0
R e si
d u
a l
Nonconstant Variance
QAM Week 10 12
Heteroskedasticity Heteroskedasticity No Heteroskedasticity
Residual Plots
13Example
-18
-12
-6
0
6
12
18
75.0 82.5 90.0 97.5 105.0 112.5 120.0
Promote
R e s
id
Residuals seem slightly more widely scattered in the middle, so it seems
there is a possibility of mild heteroscedasticity.
QAM Week 10
14
Extreme Heteroscedasticity
-3000
-2000
-1000
0
1000
2000
3000
4000
0 40000 80000 120000 160000 200000
Salary
R e
s id
QAM Week 10
15
Consequences and Cure for Heteroscedasticity
• This “fan shaped” pattern is the classic indication of heteroscedasticity
• Heteroscedasticity leads to the standard error of the regression coefficient being inaccurate. This means C.I and H.T. for this coefficient could be misleading
• There are two ways to deal with heteroscedasticity: • Use Weighted Least Squares as the regression technique. Use a
logarithmic transformation of the response variable
• Curing heteroscedasticity is beyond the scope of this unit.
QAM Week 10
16
QAM Week 10 17
2. Normality of residuals
The error terms/residuals are assumed to be:
1. Homoscedastic (constant error variance)
2. Normally distributed
18
Normality of Residuals • The residuals should form a normal distribution
• There are many formal tests available for this (Chi Squared, Shapiro- Wilks, Anderson-Darling, Lilliefors, Q-Q Plot etc).
• This assumption is usually satisfied for most data sets, unless the residuals are severely non-normal.
• As a quick practical step it is not unusual just to plot a histogram of the residuals and qualitatively observe whether or not there is marked deviation from a normal distribution.
• If the residuals appear non-normally distributed, there are transformation of variable techniques available but these are beyond the scope of this unit.
QAM Week 10
QAM Week 10
Normality of residuals
20
Histogram of Residuals
The residuals closely resemble a normal distribution indicating no significant
issue with this assumption.
QAM Week 10
QAM Week 10 21
In this video, we looked at 2 assumptions of the
regression model that need to be satisfied to
validate our model.
In the next video, we look at the remaining 2
assumptions of the regression model.