Powered By Blogger

Monday, March 28, 2016

Perceptions, reality, and statistics: A look at class sizes


“Nobody goes there anymore, it’s too crowded.”
This famous quote from Yogi Berra, a major league baseball catcher, manager, and coach seems self-contradictory. Is it possible that the statement could be true? A place could be too crowded and still be empty? Not at the same time, of course, but maybe at different times? And what would the patrons say about attendance?

In a wonderful and short article [1] I ran across just a while ago, David Hemenway addresses just this issue.  Mr. Hemenway looks at class sizes.  For example, a school claims an average class size is 14. Professors and administrators look at the total number of students, divided by the number of classes and, viola, that ratio gives an average class size of 14. For students, the story may be quite different.

If the class sizes are different, some small, some large, a typical student may perceive a large class size, like 78.  That is, the expected class size may be 78.
The average class size, from an administrator’s viewpoint is: \( \overline{X} = \frac{M}{N} \) where M is the total number of students and N is the number of classes.
From a student’s view, the expected class size is: \[ X^{*} = \sum_{i=1}^{N} \left( \frac{x_{i}}{M} \right) x_{i} \]
The fraction in the summation represents the probability of a student having a class of size \( x_{i} \) and the product is the probability of getting that size class. We sum over all class sizes to get the expected value of any randomly chosen class.
Here’s a small table to illustrate this approach:



The average class size is 14.3 = 729/239.
Let’s look at the expected class size for a student. The probability of being in a class of size 10 is 0.69, and the probability of being in the single large class size of 229 is 0.31. Even though there are lots of small classes, the single class size is large that there’s a good chance a randomly selected student will be in it.

In the chart, the rightmost column shows each term of \(X^{*} \) and the expected class for students selected randomly is then 78.8.  Interestingly, none of the classes have this size; the sizes are either 10 (small) or 229 (large). So this means over the process of picking a random student, putting him in one of the classes, checking the class size, then averaging, the randomly selected student would see, on average, a size of 78.8.
What does it mean to have an expected size of 78? None of the class sizes are 78; the sizes are either 10 (most of the classes) or 229 (one class).

One way to think of the expected value is to consider running an experiment many times. For us, let’s select a student at random from all the 729 students. We pick a class randomly with the distribution given by the ratio of the class size to the total size of all classes. Then we ask, how many students are in the class of this student. We repeat this experiment for another student, get the size of her class, and repeat the selection of a student and corresponding class, again and again.

The expected value is the average class size from all these experiments.  Following Hemenway, we let \( x_i = \) size of the ith group; M = number of students, and N be the number of classes.
Is the expected class size is always greater than the average class size? Will students generally experience large class sizes?

The answer is Yes, except for the case when all the class sizes are equal. To see this, we look at the difference between the expected class size and the average class size.
Recall \( \overline{X} = \sum x_{i} / N \) and \( M = \sum x_{i} \)
Now, the difference between the expected class size and the average class size is:

\[  X^{*} - \overline{X}  =  \sum \left( \frac{x_{i}}{M} x_{i} \right) - \sum x_{i} / N
\]
\[    = \frac{1}{M} \sum x_{i}^{2} - \frac{1}{N} \sum x_{i}
\]
 
\[  =  \frac{N \sum x_{i}^{2} - M \sum x_{i} }{ {M \cdot N} }
 \]

\[  =  \frac{N \sum x_{i}^{2} - \sum x_{i} \sum x_{i} }{ {M \cdot N} }
 \]

\[  =  \frac{N \sum x_{i}^{2} - \left( \sum x_{i} \right)^{2}}{ {M \cdot N} }
 \]
 
\[  =  \frac{\left( N \sum x_{i}^{2} - \left( \sum x_{i} \right)^{2} \right) }{ N^{2} } \left( \frac{N}{M} \right)
 \]
\[ = \sigma^{2} \frac{N}{M}  \]
\[  =  \sigma^{2} / \overline{X} \]

For \( X^{*} = \overline{X} \) we would need \( \sigma^{2} = 0 \) meaning the standard deviation of the class sizes is zero, or, simply all classes are the same size.
Let me quote Hemenway:
“I often buy diner at a fast-food restaurant near my home. Although most customers’ order “to go,” the place is almost always crowded, and I consider it quite a success. One evening about 6:30 I went in and there was no one in line. The manager was serving me, so I asked, “Where is everyone?” “It often gets quiet like this,” he said, “even at dinnertime. The customers always seem to come in spurts. Wait fifteen minutes and it will be crowded again.” I was surprised that I had never before seen the restaurant so empty. But I probably shouldn’t have been. If I am a typical customer, I am much more likely to be there during one of the spurts, so my estimate of the popularity of the restaurant, \( X^{*} \) is likely to be much greater than it true popularity, \( \overline{X} \).”
When you attend class, go to a restaurant, or even go to the beach, there is likely a larger a group of people then than what you’d find if looked at all the times and not just when you, a typical person, goes to these places.
I guess Yogi Berra might say:
 “When you are there, that’s when it’s crowded.”
References 
[1] David Hemenway, “Why Your Classes Are Larger Than “Average”” Mathematics Magazine, 55 (1982), pp. 162-64, reprinted in The Harmony of the World by Gerald L. Alexanderson (editor) with Peter Ross, The Mathematical Association of America, 2007.

Footnote: This is my first extended experience with equations in MathJax using the LaTeX formats. It's been a little trying but I hope to do better next time.