Someone makes a claim: does it make sense? How can we tell?
A common rule:
If the probability of an observed sample is .05 or less
(that is, a 1 in 20 chance of worse), assuming the truth of some conjecture,
then the sample is contradictory to that conjecture. Otherwise the sample will
not be considered contradictory to the conjecture.
Strategy: Inference making procedure, p. 136
Example: #3, p. 137
Section 5.1: Sampling distributions and the normal curve (why is the
normal distribution so important?)
A statistic is a number that we calculate from a
sample of data:
A parameter is a number that could be calculated
from a population if all of the data were accessible:
Generally we pull many values from a population to form a
sample.
It turns out that if we take relatively large samples, sample
after sample, and compute the statistic over and over, the histogram of
statistic values will look bell-shaped.
Let's try that with coins: each of you will have a coin, and flip it 30 times. Our statistic will be the total number of heads.
Properties of the sampling distribution:
The mean of the sampling distribution of the sample mean
, denoted
, is equal to the mean of the population
.
The standard deviation of the sampling distribution of the sample mean
, denoted
, is
where n is the sample size. This statistic is called the
standard error of the mean, and you will see it
reported by StatCrunch, for example.
Here's the key thing: the sampling distribution is approximately normal for a large sample size n ("large" generally taken as ).
See Figure 5.5, p. 172
This result is encapsulated in The Central Limit Theorem
(p. 167), and in the graphs of Figure 5.3, p. 168.
Calculating probabilities for sample means involves computing
Z-scores, as we've done in the past: see p. 171.
Example problem: #4, p. 177
Section 5.2: The Sampling distribution of the Sample Proportion
Definition: Consider a sample of qualitative data for which
one category, or attribute, is of interest. To describe such a sample, we will
use the proportion of the sample having the attribute of interest. This
statistic is denoted by the letter p.
Properties of the sampling distribution of the sample proportion:
The mean of the sampling distribution of p, denoted
, is equal to the mean of the population
.
The standard deviation of the sampling distribution of p, denoted
, is
where n is the sample size.
Here's the key thing: the sampling distribution for
p is approximately normal for a large sample size n ("large"
generally taken as greater than or equal to n=30at the very
least, but in this case it depends as well on the value of
) as well.
Our choice of n to ensure normality is made so that
three standard errors from the estimated
are completely contained inside the interval [0,1]:
and
See p. 186 for the strategy
See Figure 5.7 and 5.8, p. 182-183
Calculating probabilities for p involves computing
Z-scores, as usual