Last Time | Next Time |
Our author says that, while it's true that "you'll find later in life that it's handy to know what a standard deviation is", it is "even handier to know what a distribution is."
So our objective in this unit is to understand three things, actually:
By "catalog of distributions" I mean some candidates for the model behind the data. We may discover that the data looks symmetric, for example, which would rule out a bunch of types of distributions. If the histogram of the data is bell-shaped, then perhaps it is normally distributed, which is what are author is interested in discussing here.
How can we explain these two statistics? Let's start with that first one. Is it true that the average American has one testicle and one ovary? Think about how the average -- or mean -- is defined.
Let's try a different measure of "central tendency":
The median American has no testicles and two
ovaries.
Mean and median try to characterize the "middle" of a distribution. Do you know how they're defined?
Why do these values of mean and median make sense? What is the distribution of testicles and ovaries?
This simulation produces a variety of distributions. What are their properties?
For example, check these out:
This version of Pascal's triangle goes up to the "8 row" (with nine columns); since there are nine "bins" in my Hexstat Probability Generator, the numbers in this version of Pascal's triangle are actually indicative of the probabilities of the balls ending up in any of the bins. So it's 70 times more likely that a ball will end up in the middle bin than in the end bins.
If you want the actual probabilities of each bin, you divide
each values in the triangle by the total for the row (in this
case, ).
How is our mathematician doing at modeling the data?
What has she missed?
A deviation is the difference between a data value and the mean value. So if the mean height is 5'10", and you're 5'8" tall, then your deviation is -2. The "standard deviation" thus is a measure of the "typical deviation".
The larger is,
the more spread out the normal curve becomes. It's still bell-shaped --
it's just broader and flatter.
Unfortunately people who know a little stats may get infatuated with the normal distribution, and apply it even where it's not appropriate.
Do you believe that this data is normally distributed?
It and the statistic about Lakeside School are in the same category (Bill Gates and Paul Allen, founders of Microsoft, graduated from Lakeside, which tends to push up the mean), while the median remains unchanged.
In an unfavorable light, Bush's use of the mean could be construed as one of the three lies: "lies, damned lies, and statistics."
Better in both cases to use a median as the measure of central tendancy: