Moran's I method

A test for spatial autocorrelation in disease rates. Positive spatial autocorrelation means that nearby areas have similar rates, indicating spatial clustering.

Let's think of the areas as counties (although they could be any geographic unit).

Data Requirements

Disease rates linked to connection weights for locality pairs

Number of Monte Carlo runs to perform

Analysis

H0: disease rates are spatially independent and assigned at random among the regions

Ha: disease rates are not spatially independent

Test Statistic: Moran's I is a weighted product-moment correlation coefficient, where the weights reflect geographic proximity. Values of I larger than 0 indicate positive spatial autocorrelation; values smaller than 0 indicate negative spatial autocorrelation.

where N equals the number of regions, wij is a weight denoting the strength of the connection between areas i and j, zi is the rate in region i centered about the mean rate (using zi = xi - ave(x); xi is the rate in region i); and So is the sum of the weights, . The weights reflect how "connected" we think two areas are, and usually reflect geographic proximity. Moran's I is used to determine whether "connected" areas are more similar to one another than would be expected under spatial randomness. It asks: Do the rates in connected areas covary? Nearby areas tend to have similar rates when their populations and exposures are alike. When rates in connected (nearby) areas are similar Moran's I will be large and positive. When rates in connected areas are dissimilar Moran's I will be negative.

The expectation of I under the null hypothesis is:

The expectation becomes close to 0 as N increases. The variance of I is determined under two null hypotheses or assumptions: Normality (denoted N) or randomization (denoted R). Under assumption N the rates are sampled from a population whose distribution is normal. Under assumption R the rates are random samples from a population whose distribution is unknown. Assumption N is useful when we have good reason to believe the observations follow a normal distribution. Assumption R is less restrictive and, since we often don't know their theoretical distribution, is appropriate for disease rates. The variance under assumption N is:

Under the assumption R the variance is:

where a falling factorial is written s(b)=s(s-1)...(s-b+1),

and where

The significance of Moran's I is evaluated under assumptions R and N, and also by Monte Carlo simulation. For assumptions N and R z-scores are expressed as the difference between the observed and expected value of I in standard deviation units.

Output

P-values

Plot of Moran's I under simulation

Reference

Jacquez, G.M. 1994, User manual for Stat!: Statistical software for the clustering of health events, BioMedware, Ann Arbor, MI.