Problem Set 1
Due on Wednesday, February 8th at 11:59 EST
This assignment consists of six exercises with multiple parts which cover materials from the review
of probability and statistics (Chapters 2 and 3 of the textbook). I need help writing my essay – research paper upload your solutions as a
single PDF file on Brightspace (you can use a free app like Genius Scan or CamScanner to scan your
handwritten answers with your phone or type up your answers).
I need help writing my essay – research paper show all your work, and make sure to meet the deadline if you want your assignment to be
considered. Refer to the syllabus for the late assignment policy.
1. Joint Probability Distribution.
Value of Y
1 2 3
Value of X
6 0.04 0.05 0.15
8 0.1 0.03 0.07
10 0.25 0.02 0.08
12 0.01 0.15 0.05
(a) Find the marginal probability distributions of X and Y .
(b) Compute E(X) and E(Y ).
(c) Compute V ar(X) and V ar(Y ).
(d) Compute Cov(X, Y ) and Corr(X, Y ).
2. Linear Functions of Random Variables.
Use the random variables X and Y from the previous problem and consider two new random variables
W and V where W = 2 + 4X and V = 10 − 2Y .
(a) Compute E(W) and E(V ).
(b) Compute V ar(W) and V ar(V ).
(c) Compute Cov(W, V ) and Corr(W, V ).
(d) Compute E(V |W = 26).
3. Normal Distribution.
Suppose that the weekly average expenditure of a student in your college is $150 with a variance of
$81.
1
(a) Suppose you draw a sample of 100 students. Compute the probability that the sample mean is
less than $147.
(b) Suppose now that you draw a sample of 36 students. Compute the probability that the sample
mean is less than $147. Why is this result different from that in part (a)?
(c) With a sample of 64 students, compute the probability that the sample mean falls between $149
and $151.
4. Estimator of the Mean.
Let Y be a Bernoulli random variable with a success probability Pr(Y = 1) = p and let Y1, . . . Yn be
IID draws from this distribution. Let pˆ be the fraction of successes (i.e., the fraction of 1’s) in this
sample.
(a) Show that pˆ = Y¯ .
(b) Show that pˆ is an unbiased estimator of p.
(c) Show V ar(pˆ) = p(1−p)
n .
5. Hypothesis Testing.
In a survey of N = 400 voters, 230 responded that they will vote for the incumbent and 170 responded
that they will vote for the challenger. Let p denote the fraction of all likely voters who preferred the
incumbent at the time of the survey, and pˆ be the fraction of survey respondents who preferred the
incumbent.
(a) Use the survey result to estimate p.
(b) Calculate the standard error of your estimator.
(c) What is the p-value for the test H0 : p = 0.5 v.s. H1 : p ∕= 0.5? Will you reject the null at the
5% significance level?
(d) Construct a 90% confidence interval for p.
6. Empirical Exercise: Homework help – Summary Statistics.
To solve this problem you need to use the dataset HWK1 posted on Brightspace. The dataset
contains information on wages, gender, education level, and type of employment from 500 employees.
This exercise aims to explore the relationships between earnings, age, and gender.
To work on this exercise, you should use R and RStudio. Remember you can take a look at the
document “ Installing R and RStudio” posted on Brightspace if needed.
To submit your answers for this part, you can take screenshots of your R-studio console to show your
graphs, and include these in your single pdf file to be uploaded on Brightspace.
(a) Load the dataset into R and call it data employees. What command did you use? (Do not
use the “import dataset” command in RStudio.)
2
(b) How many observations are there in your sample? How many variables?
(c) Construct and report a histogram for Wage and LogWage. Make sure you choose appropriate bins to facilitate visualization. Would you conclude that these variables are normally distributed? Justify briefly.
(d) Report mean and standard deviation for LogWage and Age.
(e) Make and report a scatter plot of LogWage against Age for the entire sample. What do you
observe? Do earning increase with age or not? Explain.
(f) Repeat parts (d) and (e), but this time split the sample into males and females. Would you say
that the relationship between LogWage and Age depends on gender? Explain.
3
===
Writing Answer Guide:
(a) To find the marginal probability distributions of X and Y, we can sum the joint probabilities for each unique value of X or Y. For example, the marginal probability distribution for X would be:
6: 0.04 + 0.1 + 0.25 + 0.01 = 0.40
8: 0.05 + 0.03 + 0.02 + 0.15 = 0.25
10: 0.15 + 0.07 + 0.08 + 0.05 = 0.35
The marginal probability distribution for Y would be:
1: 0.04 + 0.05 + 0.25 + 0.01 = 0.35
2: 0.1 + 0.03 + 0.02 + 0.15 = 0.30
3: 0.15 + 0.07 + 0.08 + 0.05 = 0.35
(b) To find E(X) and E(Y), we calculate the expected value of each variable as the weighted sum of their possible values, where the weights are the corresponding marginal probabilities. For example, E(X) = 6(0.40) + 8(0.25) + 10(0.35) = 8.5 and E(Y) = 1(0.35) + 2(0.30) + 3(0.35) = 2.3.
(c) To find Var(X) and Var(Y), we calculate the variance of each variable as the expected value of the squared deviations from the mean. For example, Var(X) = (6 – 8.5)^2(0.40) + (8 – 8.5)^2(0.25) + (10 – 8.5)^2(0.35) = 1.375 and Var(Y) = (1 – 2.3)^2(0.35) + (2 – 2.3)^2(0.30) + (3 – 2.3)^2(0.35) = 0.315.
(d) To find Cov(X, Y), we calculate the covariance as the expected value of the product of the deviations from the means of X and Y. For example, Cov(X, Y) = (6 – 8.5)(1 – 2.3)(0.04) + (6 – 8.5)(2 – 2.3)(0.05) + … + (10 – 8.5)(3 – 2.3)(0.08) = -0.225.
To find Corr(X, Y), we divide Cov(X, Y) by the product of the standard deviations of X and Y.
(a) To find E(W) and E(V), we use the formulas W = 2 + 4X and V = 10 – 2Y and the expected values of X and Y found in part 1(b). For example, E(W) = 2 + 4(8.5) = 34 and E(V) = 10 – 2(2.3) = 5.4.
(b) To find Var(W) and Var(V), we use the formulas W = 2 + 4X and V = 10 – 2Y and the variances of X and Y found in part 1(c). For example, Var(W) = 4^2(1.375) = 18 and Var(V) = 2^2(0.315) =