Grimson's Spatial Test Example

Grimson's method

 Description: Grimson's method can be used to detect clustering of labeled objects in space, time and space-time. The test statistic, A, is the count of the number of pairs of labeled objects that are adjacent to one another. The objects can be locations of cases and controls or areas, and adjacency criteria are determined to reflect the kind of clustering under investigation. The objects possess borders, and an adjacency is said to exist when two objects share a common border. Grimson's test is sensitive to a high number of adjacencies among labeled cells.

 Notation:

Ø x : the total number of objects (both labeled and not labeled).

Ø n : the number of labeled objects (e.g. cases or high rate areas).

Ø y : the average number of borders per object

Ø Var(y) : the variance of y.

Ø A : the number of pairs of labeled objects that are adjacent.

 Null hypothesis The objects have been labeled at random. See Chapter 6 for the expectation and variance of A under this null hypothesis.

 Data screen: Grimson used the SIDS rate in 100 North Carolina counties to determine whether counties with high rates were spatially clustered. A clustering of high-risk counties might suggest a common cause of epidemiologic interest. Recall the four questions needed to use the method.

Ø What are the data (the objects)?

Ø How are the objects labeled?

Ø What kind of clustering do we want to detect?

Ø How is adjacency defined?

How to determine the data to enter on the data screen? x is the total number of objects, which is 100 since North Carolina has 100 counties. The number of high risk objects, n, is 14, based on a ranking of the counties by SIDS rate. The 14 counties were defined as high-risk because they had the highest SIDS rate and because the 15th county had a substantially lower SIDS rate. This decision was somewhat arbitrary, but provides a useful definition of high risk.

The remaining values were determined by examining a county map of North Carolina. The quantity y is the average number of borders per county, determined by recording the number of borders for each county, summing them and dividing by 100 to obtain y= 4.66. The variance of these 100 values is Var(y) = 2.347. The number of pairs of adjacent high-risk objects, A, was 12, since 12 pairs of high-risk counties were contiguous.

Data screen: These values have been entered for you into a Grimson's data file. To load them first access the Grimson data entry screen. Select `Space' from the horizontal methods menu, then `Grimson' and `Data'. Move your cursor to the `Grimson input file' field and type `Sids.dat'. Then move to the `Load' field, and press `Y'. The data entry screen should look like this (above).

Run screen: Is A=12 large relative to the value of A expected by randomly labeling 14 counties as high risk? To answer this question exit the data entry screen, select `Run' on the action menu and press `Enter'. E(A) and Var(A) are the expectation and variance of A under the null hypothesis. RC and VC are the regularity and variability components used to calculate Var(A) as RC+VC. The upper window shows the total number of objects (100), the mean number of borders per object (4.660), the number of high-risk objects (14), the variance in the number of borders per object (2.347) and the value of the test statistic (12).

A=12 appears large relative to its expected value of E(A)=4.2934. The z-score is the difference between the observed and expected value of A in standard deviation units. The significance of A is evaluated using the Poisson or the normal distribution. The first assumes A is sampled from a Poisson distribution with a mean given by E(A). The second assumes the z-score is sampled from a normal distribution with mean of 0 and variance 1.0. Both approaches yield a one-tailed test describing the probability, under the null hypothesis, of obtaining a test statistic as large or larger than the one already observed.

A = 12 is highly significant under both the normal (P=0.00002) and Poisson (P=0.00161) approaches. Whether to use the Poisson or the normal approach depends on the proportion of the variance of A, Var(A), contributed by the variability component, VC. This is `VC / Var(A)' and is 0.12965. Grimson (1991) offers the following guidance: Use the Poisson approach when VC / Var(A) is small, and the normal approach when VC / Var(A) is large. Our rule of thumb is to use the Poisson approach when VC / Var(A)< 0.20, otherwise use the normal approach. The mean and variance of the Poisson distribution are equal, thus E(A) and Var(A) should be approximately equal in order to use the Poisson approach. Here the proportion of the variance in A explained by the variability component is less than 0.20 and E(A) and Var(A) are 3.607 and 4.062, respectively. We therefore use the Poisson P-value of 0.00161.

The graph in the left window plots the significance of A against the value of A. The Poisson significance of A is shown by dashed line, the solid line is significance under the normal approach. We have little confidence in the results when a small change in A causes a large change in significance. How sensitive are the P-values to a change in A? The solid, vertical line shows the observed value of A (12). The significance curves start to increase rapidly at A=8, and A=12 is an extreme result probably attributable to a real spatial clustering of high risk counties. In reality the locations of these clusters corresponds with agricultural areas specializing in peanuts and cotton. One might hypothesize that SID's is associated with exposure to pesticides, or with cofactor of rural areas, such as socioeconomic status. These hypotheses cannot be evaluated without more detailed investigation.

Notes: Results depend entirely on the values of x, n, y, Var(y) and A entered on the input screen, be certain they are correct. y and Var(y) may be estimated by taking a sub-sample of the objects; this is useful when the number of objects is very large or when the number of borders for some of the objects is unknown.

References:

Grimson, R. C. 1989. Assessing patterns of epidemiologic events in space-time. In Proceedings of the 1989 Public Health Conference on Records and Statistics. National Center for Health Statistics.

Grimson, R. C. 1991. A versatile test for clustering and a proximity analysis of neurons. Methods of Information in Medicine 30:299-303.


Website maintained by Andy Long. Comments appreciated.