When examining proportions, we categorize responses in one of two ways: ``SUCCESS'' or ``FAILURE,'' where SUCCESS mean ``has the characteristic.'' Note that a SUCCESS is not always a good thing. For example, if we are investigating the recurrence of cancer after treatment, a SUCCESS is bad. How would we estimate a proportion based on a sample?
  1. Calculate the number of SUCCESSES in the sample - call this X.
  2. Calculate the sample size - call this n.
  3. Compute tex2html_wrap_inline119 .

We now need to know some things about the distribution of the sample proportion tex2html_wrap_inline121 :

  1. One thing we need to know is the mean. As you might guess, we can run a simulation to investigate what the mean of sample proportions estimates. We find that tex2html_wrap_inline123 where p is the population proportion of interest.
  2. Another value of interest is the standard deviation for all possible values of tex2html_wrap_inline121 (called the standard error of the proportion). We can again use a simulation to find that

    displaymath111

    We then can assume that the random variable

    displaymath112

    is approximately normally distributed with mean 0 and standard deviation 1 (at least provided np>5 and n(1-p)>5 - these special limits are imposed to assure that the central limit theorem is applicable to this rather unsymmetric distribution).

We prefer to have large sample sizes of at least 100, typically several hundred or thousand to use this approximation. We can then use the same information we applied in chapter 8 to find probabilities based on a normal Z-score. (That normal sure is handy!)

LONG ANDREW E
Mon Feb 10 23:46:14 EST 2003