Another approach to graphing a set of variables is to look at a matrix of all possible pairwise scatterplots of the variables. The scatterplot-matrix function will produce such a plot. The data
(def hardness (list 45 55 61 66 71 71 81 86 53 60 64 68 79 81 56 68 75 83 88 59 71 80 82 89 51 59 65 74 81 86)) (def tensile-strength (list 162 233 232 231 231 237 224 219 203 189 210 210 196 180 200 173 188 161 119 161 151 165 151 128 161 146 148 144 134 127)) (def abrasion-loss (list 372 206 175 154 136 112 55 45 221 166 164 113 82 32 228 196 128 97 64 249 219 186 155 114 341 340 284 267 215 148))were produced in a study of the abrasion loss in rubber tires and the expression
(scatterplot-matrix (list hardness tensile-strength abrasion-loss) :variable-labels (list "Hardness" "Tensile Strength" "Abrasion Loss"))produces the scatterplot matrix in Figure 9 .
Figure 9:
Scatterplot matrix of abrasion loss data.
The plot of abrasion-loss against tensile-strength gives you an idea of the joint variation in these two variables. But hardness varies from point to point as well. To get an understanding of the relationship among all three variables it would be nice to be able to fix hardness at various levels and look at the way the plot of abrasion-loss against tensile-strength changes as you change these levels. You can do this kind of exploration in the scatterplot matrix by using the two highlighting techniques selecting and brushing .
In the plot in Figure 10
Figure 10:
Scatterplot matrix with middle hardness values
highlighted.
the points within the middle of the hardness range have been highlighted using a long, thin brush (you can change the size of your brush using the Resize Brush command on the Scatmat menu). In the plot of abrasion-loss against tensile-strength you can see that the highlighted points seem to follow a curve. If you want to fit a model to this data this suggests fitting a model that accounts for this curvature.
A scatterplot matrix is also useful for examining the relationship between a quantitative variable and several categorical variables. In the data
(def yield (list 7.9 9.2 10.5 11.2 12.8 13.3 12.1 12.6 14.0 9.1 10.8 12.5 8.1 8.6 10.1 11.5 12.7 13.7 13.7 14.4 15.5 11.3 12.5 14.5 15.3 16.1 17.5 16.6 18.5 19.2 18.0 20.8 21 17.2 18.4 18.9 )) (def density (list 1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3 3 4 4 4)) (def variety (list 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3))(Devore and Peck [ 11 , page 595, Example 14,]) the yield of tomato plants is recorded for an experiment run at four different planting densities and using three different varieties. In the plot in Figure 11
Figure 11:
Scatterplot matrix for tomato yield data with points from
the third variety highlighted.
a long, thin brush has been used to highlight the points in the third variety. If there is no interaction between the varieties and the density then the shape of the highlighted points should move approximately in parallel as the brush is moved from one variety to another.
Like spin-plot , the function scatterplot-matrix also accepts the optional keyword argument scale .
Anthony Rossini