Chapter 9
Probability and Integration





9.3 Continuous Probability

9.3.3 Mean, Variance, and Standard Deviation

If you have a list of data values — say, `v_1, v_2, ..., v_n` — that you want to model with a normal distribution, you need to find both the average of the data and a measure of the average spread of the data away from the average. In statistical parlance the average is called the "mean."


Definition   The mean of the data `v_1, v_2, ..., v_n` is
1 n k = 1 n v k .
We denote the mean of a set of data by `m`.

The measure of average spread away from the mean requires more explanation. The directed distance of a data value `v_k` from the mean `m` is `v_k - m.` This is a signed distance: It is positive if `v_k` is larger than the mean and negative if `v_k` is smaller than the mean. If we average these signed distances, then values that are a large positive distance away from the mean would be balanced by values that are a large negative distance away from the mean. Our measure of the average spread might be small even though the values are widely spread out. A common solution to this problem is to average instead the squares of these distances.


Definition   The variance of the data `v_1, v_2, ..., v_n` is
1 n k = 1 n ( v k - m ) 2 ,
where `m` is the mean.

Since variance is an "average square distance," an appropriate "average distance" is the square root of the variance.


Definitions   The standard deviation of the data `v_1, v_2, ..., v_n` is
1 n k = 1 n ( v k - m ) 2 ,
where `m` is the mean. We denote the standard deviation of a data set by `sd`.

Checkpoint 4Checkpoint 4

Go to Back One Page Go Forward One Page

Contents for Chapter 9