Chapter 9
Probability and Integration
9.3 Continuous Probability
9.3.3 Mean, Variance, and Standard Deviation
If you have a list of data values — say, `v_1, v_2, ..., v_n` — that you want to model with a normal distribution, you need to find both the average of the data and a measure of the average spread of the data away from the average. In statistical parlance the average is called the "mean."
Definition The mean of the data `v_1, v_2, ..., v_n` is
|
The measure of average spread away from the mean requires more explanation. The directed distance of a data value `v_k` from the mean `m` is `v_k - m.` This is a signed distance: It is positive if `v_k` is larger than the mean and negative if `v_k` is smaller than the mean. If we average these signed distances, then values that are a large positive distance away from the mean would be balanced by values that are a large negative distance away from the mean. Our measure of the average spread might be small even though the values are widely spread out. A common solution to this problem is to average instead the squares of these distances.
Definition The variance of the data `v_1, v_2, ..., v_n` is
|
Since variance is an "average square distance," an appropriate "average distance" is the square root of the variance.
Definitions The standard deviation of the data `v_1, v_2, ..., v_n` is
|