Econometrics 301 – HW3
Research Question: What are the factors that determine the abortion rate across the 50 states in the USA?
To study this, use the dataset (uploaded on LMS). The variables used in the analysis are as follows:
State: Name of the State (50 US States)
ABR: Abortion rate, number of abortions per thousand women aged 15-44 in 1992.
Religion: The percent of a state’s population that is Catholic, Southern Baptist, Evangelical, or Mormon.
Price: The average price charged in 1993 in non-hospital facilities for an abortion at 10 weeks with local anesthesia (weighted by the number of abortions performed in 1992)
Funds: A variable that takes value of 1 if state funds are available for use to pay for an abortion under most circumstances, 0 otherwise.
Laws: A variable that takes value of 1 if a state enforces a law that restricts a minor’s access to abortion, 0 otherwise
Educ: The percent of a state’s population that is 25-years or older with a high school degree (or equivalent), 1990.
Income: disposable income per capita, 1992.
Picket: The percentage of respondents that reported experiencing picketing with physical contact or blocking of patients
Question:
a) Estimate the following models using lm( ) command. Considering the potential thread posed by heteroscedasticity problem, use the heteroscedasticity-consistent robust errors.
?ABR?_i = ?_1 + ?_2 Religion + e_i
?ABR?_i = ?_1 + ?_2 Religion + ?_3 Price + e_i
?ABR?_i = ?_1 + ?_2 Religion + ?_3 Price + ?_4 Laws + ?_5 Funds + e_i
?ABR?_i = ?_1 + ?_2 Religion + ?_3 Price + ?_4 Laws + ?_5 Funds + ?_6 Educ + ?_7 Income + ?_8 Picket + e_i
b) Calculate the fitted values and residuals for each model.
c) Plot the residuals of each model. Do you think the heteroscedasticity is a problem for our regression model. We have only 50 observations in this dataset. What is the limitation(s) of using the heteroscedasticity-consistent robust errors?
d) Calculate the SER and R^2 using the information you obtained in b) and verify that the summary(lm( )) provides the same SER and R^2.
e) What is the F-test results for each model? What does F-test tell us about the overall significance of our models?
f) What is adjusted R-squared means? Do you think the R^2 and adjusted-R^2 values is similar to each other?
g) How the R^2 value changes with the new variables from Model 1-to-4.
h) Do we have the right to be suspicious about omitted varible bias?
Research Question: What are the factors that determine the hourly wages?
To study this, use the dataset (uploaded on LMS). The variables used in the analysis are as follows:
Wage: Hourly wage in dollars (CPS, 1995)
Female: Gender, coded 1 for female, 0 for male
Nonwhite: Race, coded 1 for nonwhite workers, 0 for white workers
Union: Union status, coded 1 if a union job, 0 otherwise
Education: Education (in years)
Exper: Potential work experience (in years)
Question:
a) Estimate the following regression model:
?Wage?_i = ?_1 + ?_2 Female+ ?_3 Nonwhite+ ?_4 Union + ?_5 Exper + e_i
Interpret the coefficients. Are there any insignificant coefficients?
b) Instead of using wage, use log(wage) to estimate the regression model:
?Log(Wage?_i) = ?_1 + ?_2 Female+ ?_3 Nonwhite+ ?_4 Union + ?_5 Exper + e_i
How the regression outcome has changed? How the interpretation of the coefficients changed? Do you think it is a good idea to use the logarithm of wage instead of wages.
c) Now, think that both years of schooling and years of experience increases the log(wage) with a decreasing rate. How would you modify the Model 2. Once you write down the regression model, estimate the model parameters. Do you think the coefficients satisfies your expectations?
d) Consider Model 2. Now estimate the regression with monthly wages assuming that individuals work for 8 hours in a day, and 22 days in a month. Then, take the logarithm of the monthly wages and estimate the regression model again. Interpret the coefficients.
e) What happens if you mistakenly add “White” to the regression model which is coded 1 if the worker is white, 0 otherwise. Explain the dummy variable trap in detail.
—
Econometrics 301 – HW3 Research Question: What factors influence the abortion rate in the United States’ 50 states?
Use the dataset to investigate this (uploaded on LMS). The following are the variables that were used in the analysis:
State: What is the name of the state? (50 US States)
ABR stands for abortion rate, which is defined as the number of abortions per thousand women aged 15 to 44 in 1992.
The number of Catholics, Southern Baptists, Evangelicals, and Mormons in a state’s population.
The average cost of an abortion at 10 weeks with local anesthetic in non-hospital institutions in 1993. (weighted by the number of abortions performed in 1992)
Funds: A variable that takes the value 1 if state funds are available for usage in most instances to pay for an abortion, and 0 otherwise.
Laws: A variable that is 1 if a state has enacted a legislation restricting a minor’s access to abortion, and 0 otherwise.
Educ: In 1990, the percentage of a state’s population aged 25 or older who had a high school diploma (or equivalent).
Income: per capita disposable income in 1992.
Picket: The percentage of responders who said they have been subjected to picketing that included physical contact or patient blockage.
Question:
a) Using the lm() tool, estimate the following models. Use the heteroscedasticity-consistent robust errors to address the potential issue offered by the heteroscedasticity problem.
?ABR? i =?_1 +?_2?ABR? i =?_1 +?_2?ABR? i e i + religion
?ABR? i =?_1 +?_2 Religion +?_3 Price + e i?ABR? i =?_1 +?_2 Religion +?_3 Price + e i?ABR? i
?ABR? i =?_1 +?_2 Religion +?_3 Price +?_4 Price +?_5 Price +?_6 Price +?_7 Price +?_8 Price +?_9 Price +?
e i = 4 laws +?_5 funds
?ABR? i =?_1 +?_2 +?_3 +?_4 +?_5 +?_6 +
2 Religion +?_3 Price +? 4? 5? 6? 7? 8?
?_5 Funds + 4 Laws =
6 Educ +?
Picket + e i = 8
b) For each model, calculate the fitted values and residuals.
c) Plot each model’s residuals. Do you think the heteroscedasticity in our regression model is a problem? This dataset has only 50 observations. What are the drawbacks of adopting robust mistakes that are heteroscedasticity-consistent?
d) Using the information from b), calculate the SER and R2, and check that summary(lm()) returns the same SER and R2.
g) What are the results of the F-test for each model? What does the F-test tell us about our models’ overall significance?
f) What does the term “adjusted R-squared” mean? Do you believe the R2 and adjusted-R2 figures are comparable?
g) How the R2 value changes as the number of variables in Model 1 to 4 increases.
h) Is it reasonable to be wary about omitted variable bias?
What factors impact hourly pay, according to the research question?
Use the dataset to investigate this (uploaded on LMS). The following are the variables that were used in the analysis:
Wage: Dollar hourly wage (CPS, 1995)
Gender is coded 1 for female and 0 for male.
Nonwhite: Race; nonwhite workers are coded 1; white workers are coded 0.
Union status is coded 1 if the job is a union job and 0 if it is not.
Education is really important (in years)
Exper: Possibility of gaining work experience (in years)
Question:
a) Create a regression model that looks like this:
?Wage? i =?_1 +?_2 Female+? 3 Nonwhite+? 4 Union + e i
Calculate the coefficients. Are there any coefficients that aren’t significant?
b) To estimate the regression model, instead of using wage, use log(wage):
?Log(Wage? i) =?_1 +?_2 Female+? 3 Nonwhite+? 4 Union + e i
What has changed in the regression outcome? What changed in the meaning of the coefficients? Do you believe using the wage logarithm instead of wages is a good idea?
c) Now imagine that the log(wage) increases at a decreasing rate as both years of schooling and years of experience increase. What changes would you make to the Model 2? Estimate the model parameters after you’ve written out the regression model. Do you believe that the coefficients meet your expectations?
d) Think about Model 2. Now calculate the regression using monthly pay, assuming that people work 8 hours per day for 22 days per month. Then, using the logarithm of monthly wages, recalculate the regression model. Calculate the coefficients.
e) What happens if you add “White” to the regression model by accident, which is coded 1 if the worker is white and 0 otherwise. Explanation of the dummy variable trap.