Empty Cell Test Example

Empty cells

 Description: The empty cells test is based on E, the number of empty `cells' (time intervals) in a sequence of consecutive time intervals. It is sensitive to a temporal clustering of cases such that one or more of the time periods have several cases while other time periods have none. When cases cluster within some of the cells E will be larger than its expectation. E is smaller than its expectation when equal numbers of cases tend to occur in all of the cells.

 Notation:

Ø N : Number of cases in a time series

Ø t : Number of time cells

Ø E : Test statistic, number of empty cells.

 Null hypothesis: The cases are allocated at random among the time cells. Grimson (1993) gives equations for the expectation and variance under this null hypothesis as

E((E)2)=(t)2t-N(t-2)N

Var(E)=E(E)(1-E(E)+E((E)2)

The notation (a)k indicates a falling factorial such that (a)k=a(a-1)...(a-k+1). For example, (4)3 is 24, obtained as 4x3x2.

 Significance: Under this null hypothesis we wish to determine the probability, P, of obtaining a number of empty cells greater than or equal E. The significance of E is evaluated using the exact P-value:

The notation indicates a binomial coefficient. We want to test for clusters, and P(E ³ E*) is evaluated as a one-tailed test.

Data screen: Invoke Stat's Empty cell method in the `Time' method menu. Select `Data' to display the data entry screen. Enter the name of the time series file `bones.tim' and press F10 to exit.

To use the test the number of cases must be small enough so the expectation of the number of empty cells is greater than 1. Use Appendix A at the back of this Chapter to determine whether your data can be analyzed using the empty cells approach. Bones.tim has 17 space-time cells and 24 cases. Look up `17' in the `t' column, to learn the maximum number of cases, N, is 46. 17 is smaller than 46 so you proceed to use the empty cells test. If more than 46 cases had occurred you could not use the empty cells test, and instead should consider an alternate such as Dat's 0-1 matrix method, the scan method or Larsen's method.

Run screen: Select `Run' from the empty cell method's action menu and press `Enter'. The window displayed on the right of the run screen shows E, the number of empty cells (6), its expectation E(E) (3.968) and variance Var(E) (1.713). P is the probability, under the null hypothesis of obtaining a number of empty cells greater than or equal to E. The graph shows a plot of E on its expectation E(E). Points to the left of the diagonal line show a tendency towards clustering, points to the right show a tendency towards uniformity. The plot suggests clustering for the bones data, but the number of empty cells is not significant (P=0.1177) and one concludes the data are not clustered.

Simultaneous time series: When several time series are tested simultaneously P-values are combined using the Bonferroni approach. When at least 20% of the areas have an expected number of empty cells of 5 or more the results can be combined as a continuity-corrected chi-square with one degree of freedom.

Here the `i' subscript indicates a statistic for the ith time series.

Data screen: To illustrate, invoke Stat's Empty cell method in the `Time' method menu. Select `Data' to display the data entry screen. Enter the name of the time series file `measles.tim' and press F10 to exit. The Ederer-Myers-Mantel test that we applied earlier in this Chapter is sensitive to an excess of cases in a single cell. In contrast, the empty cells test is sensitive to an excess number of time intervals without any cases at all. These two statistics focus on different numerical aspects of the data.

Run screen: Select `Run' from the empty cell method's action menu and press `Enter'. The window displayed on the right of the run screen shows E, the number of empty cells, its expectation E(E) and variance Var(E). P is the probability, under the null hypothesis of obtaining a number of empty cells greater than or equal to E. The graph shows a plot of E on its expectation E(E). All of the measles data plot to the left of the diagonal line, showing a tendency towards clustering. Why did only 4 of the 18 counties have results? Many counties have the message `E(E) < 1.0, too many cases'. This means the expectation of the number of empty cells was too small to use the empty cells method, and the county was excluded from the analysis. Other counties had the message `too few cases' and were also excluded from the analysis. Use the `PgDn' key to scroll the results window. It should look like this:

When at least 20% of the time series have an expected number of empty cells of 5 or more the results can be combined as a continuity-corrected chi-square with one degree of freedom. None of the counties had expectations larger than 5, and Stat has combined results across time series using a Bonferroni approach. The Bonferroni P-value is 0.0486, and we conclude the measles demonstrates time clustering when the time series from counties corresponding to rows 7 (Kalamazoo), 10 (Eaton) and 11 (Ingham) are considered simultaneously.

Notes:


Website maintained by Andy Long. Comments appreciated.