1a) on campus: 24/33 = 72.73%

off campus: 37/67 = 55.22%

1b) Off campus students are much more likely to not participate in extra curriculuar activities than on campus students. Overall, it appears that on campus students are more likely to participate in 1 activity or 2 or more activities.

1c) With a large p-value of 23% > than any reasonable alpha, we fail to reject Ho. We failed to find evidence of an association between residential status and level of participation in extracurricular activities.

[fascinating choice, writing team. The graph looks like there is an association and so do the summary stats. And then the inference procedure takes us the other way. I wonder how many students will think the 3 answers must align?]

2a) (3/9)(2/8)(1/7) = 1.2%

2b) As 3 women being picked would only happen 1.2% of the time by chance, it may be true that the manager did not use random selection.

2c) This is improper. The probability of picking a woman would stay at 1/3 the entire run of the simulation. But because we are sampling without replacement, the probability of picking a woman (or a man) changes with every selection.

3a) Normal; mu = 120, sigma = 10.5, P(x >140) = P(z > 1.905) = 2.8%

3b) The average of 3 days will have a smaller standard deviation (10.5/sqrt3) and should be closer to the average of 120. So the school would be less likely to lose funding.

3c) (2/5)^3 = 6.4%

[seems like a heavy probability year!]

4a) Because income is skewed right, the mean income will be greater than the median income. Thus reporting the mean will be a more impressive figure than reporting the median. The median would be the better choice because it would more accurately represent the true center of a skewed data set.

4b) Method 1 will suffer from a large voluntary response bias. Alumni with low income will not want to participate and thus the estimate will be too high.

Method 2 will be random thus less biased. While it will still suffer from a non-response bias, especially from those embarrassed by low-income, the estimate should be closer to the parameter. Still too low, but closer.

5) matched pairs t-test

Ho: mu = 0 (the true mean difference of woman - man = 0)

Ha: mu > 0 ('' " " " " " " " " >0)

men and women randomly selected; graph of differences reasonably symmetric

t = 3.118; df = 7

With a p-value of 0.008 < 0.05, I reject Ho. I found strong evidence that the difference in purchase price of women - men > 0.

6a) y-hat = 4.92 (plug in 175). 5.88 - 4.92 = 0.955 FCR. Car A had a fuel consumption rate that was 0.955 higher than predicted for its length of 175.

6bi) The point circled should be (93, 0.955).

6bii) Car B has a FCR that is almost exactly the same as its predicted FCR, given its length.

6c) For Engine Size, the larger the engine size, the larger the residual in using length. Whereas for the Wheel Base, there appears to be no association between the length of the wheel base and the size of the residual using length.

6d) He should use Engine Size, because it will add extra value to his model. It shows that Engine Size increasing also increasing FCR. Whereas Wheel Base adds no new information to the model.

[a #6 you can use BEBORE inference! Cool!].

More info