- Statistics are closely related to probability, which is why they're
grouped together in our reading ("Students would be better served by learning
more about probability and statistics", p. 175), and which is why we're studying them right
after studying probability.
Our author says that, while it's true that "you'll find later in life
that it's handy to know what a standard deviation is", it is "even
handier to know what a distribution is."
So our objective in this unit is to understand three things, actually:
- A distribution
- Measures of central tendancy (mean, median)
- Measures of spread (standard deviation)
- Let's start with heights of NKU students:
- We collect data;
- construct a histogram, or graph of the data, grouping the data into
"bins"; then
- attempt to describe the distribution of the data using
statistics. Some of the tools we use are
- A catalog of distributions
- Measures of central tendancy (mean, median)
- Measures of spread (standard deviation)
By "catalog of distributions" I mean some candidates for the
model behind the data. We may discover that the data looks
symmetric, for example, which would rule out a bunch of types
of distributions. If the histogram of the data is bell-shaped,
then perhaps it is normally distributed, which is what are
author is interested in discussing here.
- Some of my favorite statistics:
- The average American has one testicle and one
ovary.
- The average income for a Lakeside School (Seattle, Washington)
graduate was $2.5 million. (In 1997)
How can we explain these two statistics? Let's start with that
first one. Is it true that the average American has one
testicle and one ovary? Think about how the average -- or mean
-- is defined.
- The mean is only one measure of "central tendancy", or
measure of the middle of a data set.
Let's try a different measure of "central tendency":
The median American has no testicles and two
ovaries.
Mean and median try to characterize the "middle" of a
distribution. Do you know how they're defined?
Why do these values of mean and median make sense? What is the
distribution of testicles and ovaries?
- What is normally distributed data?
- A lot of data exhibit the normal curve - but some data don't!
- The Empirical Rule (for normal data)
- The greek letter
represents the "standard deviation", which is the spread of the
data. The larger
is,
the more spread out the normal curve becomes. It's still bell-shaped --
it's just broader and flatter.
- Percentages in categories, by using
as a
measuring stick.
- Practical
Example
We can now answer a few of these questions using the empirical rule.
- On page 178, we see that people exaggerate their heights -- this
shows up as a shift in the mean, in the position of the middle of the
data. Interesting use of statistics, isn't that!