Sorry: I don't have the quiz over the Chi-square distribution graded
yet....
(last!) Quiz today over sections 11.1/11.2
Sections 11.1/11.2: Describing Relationships: Scatterplots and the Correlation Coefficient
In contrast to chapter 12, where we attempted to determine if two
qualitative variables were dependent, in chapter 11 we look at the relationship
between two quantitative variables, and attempt to decide if it is
linear.
We use a "scatterplot" to represent the two variables:
Interpreting a scatterplot:
positive linear relationship (as x increases, y increases)
negative linear relationship (as x increases, y decreases)
no linear relationship apparent
Your conclusions should be drawn only for values of the
independent variable (X) between the limits of
the data values for X.
Write your interpretation in terms of the variables of
interest, not "X" and "Y"
We use the correlation coefficient (denoted "r") to
characterize the strength of the linear relationship:
See the image above!
Always between -1 and 1
correlation coefficient test: A test for linear dependence of two
variables (p. 446)
Decision rule: Accept Ha if p-value <
Test Statistic:
where n is the number of pairs, and df=n-2.
Conditions for using the test for correlation (p. 447)
At each value of x, the distribution of the
y values in the population is normal, with
standard deviation independent of x (see Figure
11.5, p. 447)
The sampled values of y are independent of one
another.
Figure 11.4, p. 447 gives a good picture to guide us.
The best point estimate of a response value for a give
predictor value is given by the line, but, as in the past, we often prefer a
confidence interval to a point estimate.
The software gives two kinds of confidence intervals: a confidence
interval for the mean response
, and a confidence interval for a
response value y(x). The first is called the "CI" by
StatCrunch (for Confidence Interval), whereas the latter is
called the Prediction Interval, and hence denoted "PI".
If the objective is to estimate the mean value of
y at a particular value of x, use the confidence
interval.
If the objective is to estimate a value of y
at a particular value of x, use the prediction interval.
Use these intervals only if a test of hypothesis indicates
that the two variables are correlated.
Use the linear regression results only within the
range of the x values used in the
regression: outside of those values, the behavior of
y may change dramatically (there's no assurance
that it continues to show a linear relationship).