There was a cryptic comment on this handout for my 2013 exam walk through about the problems with running multiple tests. I had no time to elaborate, so I'll do so here.

If you wanted to test if Froot Loops are uniformly distributed, one method you could use is to run a 1-proportion z-test (p = 20%) for each of the five colors. This has numerous problems. The biggest problem is the accumulation of Type 1 errors. Every time you run a test, you have (usually) a 5% chance of committing a Type 1 error. But now you ran five tests. So the sum results in a 25% chance of making a Type 1 error.

The reason we need Chi-square is because it has the capacity to evaluate all five categories simultaneously and thus avoids this problem. Jessica Utts (the chief Reader in waiting) addressed this issue in her talk to the Readers. She discussed the problem of researchers running test after test after test until the find "significance". (This talk is posted on her page under Representative Presentations.) You can also see a humorous presentation of this idea by the brilliant XCKD.