Dat's Method Example

Dat's method

Description: Dat's 0-1 matrix test (Dat, 1982) is used to detect clustering in time. The test statistic, A, is the number of cells containing more than the number of cases expected in the absence of clustering. A large test statistic indicates cluster avoidance such that some of the time intervals have slightly more than the expected number of cases. The test statistic is small when cases cluster in a few time intervals. Stat! provides two tests for temporal clustering under Dat's method; within a single time series (using the z-score) and across several time series simultaneously (using the chi-square). It answers two questions: `Is there an unusual pattern over time in one time series' and `Is there an unusual pattern over time in several time series?'

 Notation:

Ø t : Number of time intervals

Ø n : Total number of cases observed over t

Ø n/t : Number of cases expected in an interval in the absence of time clustering

Ø A : The test statistic, this is the number of time intervals with at least [n/t-0.5] cases

Ø [x]: The least integer greater than x. For example, [1.3]=2.

 Null hypothesis: The n cases are distributed at random across the t time intervals. Under this null hypothesis the expectation and variance are

where

 Significance: A N(0,1) z-score is calculated as

The approximate distribution of z is normal with a mean of 0 and unit variance. P-values are evaluated by comparing z to the percentiles of the normal distribution.

When analyzing s time series simultaneously an overall P-value is obtained as a Chi square with one degree of freedom:

This allows one to test for simultaneous clustering in several time series. See Chapter 7 for an example.

 Data screen: Use Dat's method only when the number of time intervals is greater than 4 and less than 11. Stat! will tell you if the data do not meet this criteria. If 11 or more time intervals are present you may be able to aggregate data by combining time intervals. The file `bones.tim' has 17 time intervals, too many for Dat's method. The data have been aggregated in `bonesdat.tim' by deleting the first time interval and aggregating every 2 intervals. Go to the Dat input screen and enter the name of the file containing the time series you wish to analyze (`bonesdat.tim').

 

 Run screen: Exit the data entry screen by pressing `F10'. Select `Run' on the action menu and press `Enter'. The calculations will take a moment and the Dat run screen will be displayed. The plot of A against its expectation shown on the left. Under the null hypothesis A = E(A), and a line with a slope of 45 degrees results. Under cluster avoidance A > E(A) and time series plot above the 45o line. When time clustering exists A < E(A), and time series plot below the line. The window on the right displays, for each time series in the data file, A, its expectation, variance, z score and corresponding P-value. The symbol for the bone fracture data plots slightly below the 45 degree line, suggesting temporal clustering, but the P-value (0.354) is not significant. You conclude there is no evidence of temporal clustering under Dat's method.

Now let's consider the time series `measles.tim' describing measles cases in 18 Michigan counties. Go to the Dat data entry screen and enter measles.tim.

Run screen: Then press `F10' to exit the screen, select `Run' on the action menu and press `Enter'. The calculations will take a moment and the Dat run screen will be displayed. The plot of A against its expectation shown on the left. When clustering is absent A=E(A) and a line with a slope of 45 degrees results. Under cluster avoidance A > E(A) and the points are above this line. When time clustering exists A < E(A) and the points are below the line. The time series for measles.tim are well below the line, suggesting temporal clustering.

 The right hand window shows which counties demonstrate clustering in time. Column 1, `row', is the row in the time series, each row corresponds to a county. Columns 2-6 are A, its expectation and variance, the z-score and its significance. The message `too few cases' appears whenever the total number of cases is less than or equal to two times the number of time intervals.

 Time clustering is indicated by significant, negative z-scores, as occurs for row 12 (Livingston county). Is the overall pattern of negative z-scores among the 3 counties described in rows 1, 12 and 13 significant? The chi-square (7.594) in the upper window is significant (P=0.0059), indicating significant time clustering when the three counties are combined. Examination of the z-scores found time clustering to be particularly strong in Livingston county. You conclude there is evidence of time clustering of measles cases under Dat's method.

 

 Notes: Use Dat's test with counts, not rates. You cannot use the method when the expected number of cases in each interval is smaller than 2. This means the total number of cases in the time series must be greater than twice the number of time intervals, otherwise consider using the empty cells test instead. The test is more sensitive than the Ederer-Myers-Mantel test in detecting multiple clusters within a space sub-unit. Within a time series the method assumes population size does not change through time.

 References:

Dat, M. V. 1982. Tests for Time-Space clustering of Disease. Ph. D. dissertation, Dept. of Biostatistics, SPH, University of North Carolina, Chapel Hill, NC.
Website maintained by Andy Long. Comments appreciated.