Here are the two examples that most helped me explain R2 to my students. They have been posted on the list many times over the years and I have lost track of the original author!

Height explains weight. Not totally, but roughly. Suppose R2 is 75% for a dataset between height and weight. We know that other things affect weight, in addition to height, including genetics, diet and exercise. So we say that 75% of a person's variation in weight can be explained by the variation in height, but that 25% of that variation is due to other factors.

Suppose you are buying a pizza that is $7 plus $1.50 for each topping. Clearly Price = 7 + 1.50(# of toppings). Clearly r and R2 are 1 and 100%. Does this mean that the number of toppings 100% determines my cost? No, clearly the $7 base price has a lot to do with the price! However, my variation in price is 100% by the variation in the number of toppings I choose.

Al Coons has an activity regarding this topic that is archived at this location.

Dan Teague gives a nice explanation of the math involved for R2 on this archived post.

R2 was discussed on the list on this date.

3. Do I have to teach log transformations?

*Yes!

*Why? (especially when my calculator can do it for me and has all these fancy commands! Can't I have my students use those buttons?)

*It's on the course description! And here's why:

*The idea of transforming data to achieve linearity is a powerful and important idea. It is this idea we are teaching. Re-expressing data and dealing with it in it's transformed and linear state is crucial. As is understanding how to back-transform to make an appropriate prediction.

Dave Bock discusses transformations for linearity on this archived post.

4. What is the difference between confounding and lurking variables?

Paul Velleman gives a great post about confounding and lurking variables here.

The list had a great discussion about confounding and lurking variables on Nov. 13th and 12th. Click here and also go back one day to see the full discussion.

Josh Zucker discusses the issue of extraneous variables and gives a list of links.

5. What is the difference between independence and mutually exclusive?

If two events are independent, the outcome of one will not affect the outcome of the other. i.e., Whether or not it rains and whether or not a coin flips heads or tails.

If two events are mutually exclusive, if one happens the other event cannot happen. For example, in picking one M&M from a bag, I can find the probability of drawing green or red. But if I draw green, I cannot draw red.

As we saw on the '02 MC #23, it is useful to notice that if two events are mutually exclusive, they affect each other quite powerfully: if one of them happens, the other CANNOT occur. Thus they are dependent.

Independence vs. mutually exclusive has been discussed on the list.

6. How much probability do I have to teach?

Floyd Bullard has submitted an awesome post about probability that can be read in the archives.

I would strongly encourage rookies to read this post BEFORE teaching probability!

7. Why doesn’t X + X = 2X?

Here's a great explanation from Dave Bock:

"For a short answer, try a thought experiment:

Let X represent the outcome when you roll a die. the 2X represents rolling one die and doubling the result. The possible outcomes are {2, 4, 6, 8, 10, 12}; they are equiprobable.

On the other hand, X+X represents rolling two dice (or one twice). Now the possible outcomes are {2, 3, 4, 5, 6, 7, ..., 12}. Some are far less likely than others. Clearly this is a very different situation.

You can actually calculate both variances, but first just think about the distributions. It should be pretty obvious that X+X is unimodal and symmetric, peaking around 7 with very low tails while 2X is uniform across the same range. The two means are the same, but X+X has a smaller variance than 2X.

When confronting these situations, students must learn to ask themselves how many random values they are working with. One random value multiplied by a constant behaves much differently from summing several different random values.

I urge students to recognize that a random variable in Statistics is not the same animal as a variable in algebra. In algebra what we call a "variable" is really just an unspecified constant. With that understanding, no matter what number I use for X I'll always substitute that same value every time I see an X, so it must be true that X+X+X = 3X.

Then I put my Statistics hat on, declare them "random variables", and pick up a die. I substitute the results of the first roll for the first X, roll again for the value of the second X, etc. It's pretty clear now that this X+X+X = 3X equation that seems so obvious in algebra is false for random variables in Statistics. (One time the four values I randomly rolled actually worked! The kids thought that was hilarious. Their laughter at my bad luck clearly showed they understood the issue.)"

Read a list discussion thread about adding random variables.

Peter Flannigan-Hyde as written an article for AP Central about adding random variable.

8. I’m running out time! What can I cut out?

Note that once you introduce inference, you can teach the last part of the year very quickly! Especially inference for slope, which is on the AP test.

For inference for slope, focusing on interpreting the computer output can save time.

Not getting into all the nitty-gritty details about homogeniety and independence can save time.

Following the pacing guide that comes with the textbooks, can help avoid this problem to begin with, but if you're reading this, it may be too late! :o)

Starting cumulative review while finishing inference can eliminate the need for lots of days of review.

Reviewing regression while teaching inference for slope is a natural and helpful step for preparing for the exam.

9. How much work do students need to show? And what about t*’s that are not on the table?

The short answer is that most list contributors recommend that students show formulas. Both with just variables and then with the numbers plugged in. It shows that the student understands what is going on and it eliminates the concern that students would lose points if they accidentally plugged something into their calculator incorrectly. Charlie Peltier gave a helpful response that is archived here.

A note about the t* for t-intervals. If a student uses technology for certain procedures (e.g., 1-sample with n = 167 or any 2-sample interval), the t* will not be on the table. It is OK to leave the formula with all the numbers plugged in and the t* just stays as a variable. OR a student can use a conservative approach that uses a t* that is on the table, but then they need to calculate their interval by hand so their answer matches the df they used.

If students and/or teacher really want to find the t*, they can use the inverse t function. If students have an 83, they need a t-inverse program. This program is legal (because it just matches the 84) and can be found here.

A few other points about this:

For hypothesis tests and confidence intervals, the AP rubrics have (thus far!) required name OR formula. So students can get full credit without the formula.

Numerous multiple choice problems on the '02 exam require formula understanding:

#8--1 sample t-interval

#11--Chi-Sq expected

#21--Confidence interval for slope

#32--Binomial and geometric formulas

#38--Binomial and 1-prop z formulas

TI-talk is discouraged. Statements like: normalcdf (1.2, 9999) are just not good communication. While showing a total by-hand formula is not required, good communication is. For example, on a binomial problem, students could write:

Binomial

n = 6

p = 0.87

P(x = 4) = <----- (from calculator)

It has been frequently recommended on this list that students show z-score calculations and don't use technology to shortcut that step!

10. What textbook should I use?

see the resource page here

11. What’s a good review book?

see the resource page here

12. Why do we pool for a 2-proportion z-test?

Charles Peltier has written an article for AP Central about pooling.

13. Betty and Sally! Arggh! :o) 2002 MC #39

At first this problem seems impossible! How could a two-tailed test reject what a one-tailed test failed to reject!?!? Answer: if the one-tailed test shaded the wrong way! Only z = -1.98 a sufficient value to reject a two-tailed test. And if z = -1.98 is shaded greater than, then the one-tailed test fails to reject! Pretty tricky!

14. How do I promote my class?

Use Algebra 2 as a pre-requisite and keep the class accessible to lots of students: because it is!!

Here are some advertising flyers to help you out:

Go to: http://www.bbn-school.org/us/math/ap_stats and click on Course Description

Go to: http://www.district196.org/evhs/People/baileyrcweb/ and click on AP Stats and then Why Take...