Grimson's Temporal Method Example

Grimson's method

Description: Grimson's method may be used to detect clustering of labeled objects in space, time and space-time. The test statistic, A, is the count of the number of pairs of labeled objects that are adjacent to one another. The objects can be case counts within intervals or rates. Adjacency criteria are determined to reflect the kind of clustering under investigation. The objects possess borders, and an adjacency is said to exist when two objects share a common border. Grimson's test is sensitive to a high number of adjacencies among labeled cells.

Notation:

Ø x : the total number of objects (both labeled and not labeled).

Ø n : the number of labeled objects (e.g. cases or high rate areas).

Ø y : the average number of borders per object

Ø Var(y): the variance of y.

Ø A : the number of pairs of labeled objects that are adjacent.

Null hypothesis: The objects have been labeled at random. Under this hypothesis the number of adjacencies among the labeled cells is expected to be:

The variance of A has two components, the regularity component (RC) and the variability component (VC). The regularity component is:

The variability component is:

The notation (a)_k indicates a falling factorial such that (a)_k=a(a-1)...(a-k+1). For example, (4)₃ is 24, obtained as 4x3x2. The variance of A is

Var(A)=RC+VC

Data screen: A nurse in an infirmary in Long Island thought there were an excess of bone fractures during 1989 relative to the preceding years. Counts of the number of bone fractures per quarter were recorded from 1985 to 1989, yielding a single times series of 17 time intervals.

0 2 2 1 2 0 2 0 2 0 2 0 0 2 2 2 5

Time intervals with 2 or more fractures were classified as high-risk. Do these high-risk intervals cluster in time? While these data are frequencies, notice that rates can be analyzed in the same fashion. Four questions must be answered to use Grimson's method.

Ø What are the data (the objects)?

Ø How are the objects labeled?

Ø What kind of clustering do we want to detect?

Ø How is adjacency defined?

The data are frequencies of bone fractures by quarter, quarters with at least two fractures are labeled high-risk, we wish to detect time clustering, and time intervals must be consecutive to be adjacent.

Input for Grimson's method are calculated from the data. We need to determine x, the number of time intervals; n, the number of high risk time intervals, y the average number of adjacencies per time interval and Var(y), the variance in the number of adjacencies.

quarter	fractures	adjacencies	risk
1	0	1	0
2	2	2	1
3	2	2	1
4	1	2	0
5	2	2	1
6	0	2	0
7	2	2	1
8	0	2	0
9	2	2	1
10	0	2	0
11	2	2	1
12	0	2	0
13	0	2	0
14	2	2	1
15	2	2	1
16	2	2	1
17	5	1	1
mean		1.8824
var(y)		0.1038

Table 1. Bone fracture data. Quarter : column 1; Number of fractures : column 2; Number of adjacencies per quarter : column 3; Risk label : column 4.

The number of quarters is x=17. The number of high-risk quarters is n=10, obtained by summing the `risk' column in Table 1. The average number of adjacencies per quarter is y=1.8824, and is the mean of the `adjacencies' column. The variance in the number of adjacencies per quarter is Var(y)=0.1038, and is the variance of the `adjacencies' column. The number of pairs of adjacent high-risk quarters, A, is 4, since 4 pairs of high-risk quarters are contiguous.

Now enter these data into Stat. Select `Time' on Stat's horizontal methods menu, `Grimson' and then `Data'. Then edit the Grimson input data screen so that it looks like this.

Now press F10 to exit the data entry screen and select `Run' from the Action menu. The calculations will take just a moment and the run screen will be displayed.

Run screen: Is A=4 large relative to the value expected by randomly labeling 10 of the 17 quarters as high risk? The upper window shows the total number of cells (17), the mean number of adjacencies per cell (1.882), the number of high-risk cells (10), the variance in the number of adjacencies (0.104) and the value of the test statistic (4). E(A) and Var(A) are the expectation and variance of A under the null hypothesis. RC and VC are the regularity and variability components used to calculate Var(A) as RC+VC.

The z-score is the difference between the observed and expected value of A in standard deviation units. The significance of A is evaluated using the Poisson or the normal distribution. The first assumes A is sampled from a Poisson distribution with a mean given by E(A). The second assumes the z-score is sampled from a normal distribution with mean of 0 and variance 1.0. Both approaches yield a one-tailed test describing the probability, under the null hypothesis, of obtaining a test statistic as large or larger than the one already observed.

Time clustering would cause an excess of adjacencies among high-risk time intervals. A=4 is smaller than its expected value of E(A)=5.2942, and there is no evidence of time clustering. It is not significant under both the normal (P=0.89245) and Poisson (P=0.77388) approaches. Whether to use the Poisson or the normal approach depends on the proportion of the variance, Var(A), contributed by the variability component, VC. This is `VC / Var(A)' and is 0.14285. Grimson (1991) offers the following guidance: Use the Poisson approach when VC / Var(A) is small, and the normal approach when VC / Var(A) is large. Our rule of thumb is to use the Poisson approach when VC / Var(A)< 0.20, otherwise use the normal approach. The mean and variance of the Poisson distribution are equal, thus E(A) and Var(A) should be approximately equal in order to use the Poisson approach. Here the proportion of the variance in A explained by the variability component is less than 0.20 but E(A) and Var(A) are 5.2942 and 1.090, respectively. We therefore use the Normal P-value of 0.89245.

The graph in the left window plots the significance of A against the value of A. The Poisson significance is shown by the dashed line. The solid line is significance under the normal approach. The solid, vertical line is the observed number of adjacent high-risk quarters (A=4). The intersection between the Poisson and Normal curves and the vertical line are the P-values under the Poisson and Normal assumptions. The test was not significant and we conclude there is no evidence of temporal clustering.

Notes: Results depend entirely on the values of x, n, y, Var(y) and A entered on the input screen, be certain they are correct. y and Var(y) may be estimated by taking a sub-sample of the objects, this is useful when the number of objects is very large or when the number of borders for some of the objects is unknown. When used with rates high-risk may be defined in several ways. One way is to declare time intervals high-risk if their rates are significantly large relative to a reference population. Another is to label intervals high risk only if their rates are in the upper 5% of the rates from the x time intervals. Finally, one can use external events, such as an exposure, as the risk criteria. For example, high risk time intervals would be those in which an exposure occurred.

References:

Grimson, R. C. 1989. Assessing patterns of epidemiologic events in space-time. In Proceedings of the 1989 Public Health Conference on Records and Statistics. National Center for Health Statistics.

Grimson, R. C. 1991. A versatile test for clustering and a proximity analysis of neurons. Methods of Information in Medicine 30:299-303.

Website maintained by Andy Long. Comments appreciated.