Spatial and Temporal Analysis
of Epidemiological Data


Flavio Fonseca Nobre and Marilia Sá  Carvalho

Introduction

Public health practice needs timely information on the course of disease and other health events to implement appropriate actions. Most epidemiological data have a location and time reference. Knowledge of the new information offered by spatial and temporal analysis will increase the potential for public health action. Geographic information systems (GIS) are an innovative technology ideal for generating this type of information.

Spatial analysis and the use of geographic information systems for health have been reviewed by several authors (Mayer 1983; Gesler 1986; Twigg 1990; Marshall 1991; Scholden and de Lepper 1991; Walter 1993). Of major interest has been detecting clusters of rare diseases, such as leukaemia near nuclear installations, methods for mapping and estimating patterns of disease, and health care location/allocation problems. Temporal analysis has focused primarily on the detection of cluster and abnormal case occurrence of notifiable diseases (Helfenstein 1986; Zeng et al. 1988; Stroup et al. 1989; Watier and Richardson 1991; Nobre and Stroup 1994). Work on spatial-temporal analysis has been more limited. Emphasis has been on the presentation of time series maps (Sanson et al. 1991; Carrat and Valleron 1992), the use of contour plots to analyze the time of the introduction of disease into households (Splaine et al. 1974; Angulo et al. 1979), and the use of specified contour levels extracted from the spatial distribution of a disease for different time frames (Sayers et al. 1977).

This paper is a survey of the concepts and methodologies of spatial and temporal analysis that could be beneficial for health-related studies, particularly to public health professionals and policymakers at the state, national, and local levels. Emphasis will be on integrative tools that can help in the construction of a geographic information system for epidemiology (GIS-EPI).

Spatial Analytic Techniques

Spatial variation in health related data is well known, and its study is a fundamental aspect of epidemiology. Representation and identification of spatial patterns play an important role in the formulation of public health policies. The spatial analytic techniques reviewed here are limited to those involving a graphic, exploratory analysis of data.

Point Patterns
As the name implies, point patterns, also know as dot maps, attempt to display the distribution of health events as data locations. The ability to overlay data locations with other relevant spatial information, such as a city map or the distribution of health facilities, is a general tool of considerable power. This is the simplest method of spatial analysis. It is useful for delimiting areas of case occurrences, identification of contaminated environmental sources, visual inspection of spatial clusters, and analyzing health care resources distribution.

A classical example of point pattern analysis in epidemiology is the identification of the source of cholera spread in London by John Snow. In 1854, using dots representing cholera deaths in the Soho area of London, Snow identified the source of contamination as a water pump. It is not surprising that until now, most texts of health surveillance recommended the use of pins to locate cases of notifiable diseases on a map. Several statistical techniques have been suggested to study point patterns (Gesler 1986). These methods, however, are not yet integrated into a seamless software allowing easy access and use by health professionals at different levels.

An attractive alternative to point patterns is the use of dynamic graphics, which could be more easily implemented within a GIS software (Haslett 1991). Using this approach, the dot map can be associated with a histogram of case occurrences. Selecting the upper tail of the histogram automatically highlights the corresponding cases on the map, allowing the eventual characterization of regions with a high incidence of a disease. Alternatively, cases selected in one area of the map can be classified in the histogram. The temporal case occurrence for each data location could also be displayed.

Line Patterns
Vectors or lines are graphic resources that aid in the analysis of disease diffusion and patient-to-health care facilities flow. In their simplest form, lines indicate the presence of flow or contagion between two subregions which may or may not be contiguous. Arrows with widths proportional to the volume of flow between areas are important tools to evaluate the health care needs of different locations. Francis and Schneider (1984) have designed an interactive graphic computer program, called FLOWMAP, producing a variety of maps of origin-destination data. These maps have been used to assess referral patterns of cancer patients in the northwestern part of the state of Washington, USA.

Use of line pattern analysis is quite common in epidemiology to describe the diffusion of several epidemics, such as the international spread of AIDS. It seems that the presence of a module such as FLOWMAP in a GIS package would enhance the applicability of line pattern analysis in public health.

Area Patterns
The first stage of data analysis is to describe the available data sets through tables or one-dimensional graphics, such as the histogram. For spatial analysis, the obvious option is to present data on maps, with the variable of interest divided into classes or categories, and plotted using colours or hachures within each geographic unit, know as a choropleth map. A literature search on spatial analysis revealed that among 76 papers, 54% used the choropleth map as an analytical tool.

The most common maps use pre-specified classes of health events, or the mean and standard deviation of their distribution. The maps are usually represented using administrative boundaries such as counties, municipalities, health districts, and so on, where data are usually collected. Major variables used for area pattern analyses are incidence rates, mortality rates, and standardized mortality ratio (SMR). The latter is most common in health atlases (Walter et al. 1991). At times, area pattern analysis uses statistical significance rather than raw data. Area pattern analysis also uses empirical Bayes estimates of the relative risk (Marshall 1991).

When two or more health-related variables are available in each area unit, multivariate analysis, can synthesize the information. Recently, measures of coherency between two variables were used as the variable of interest to explain the spatial pattern of disease occurrence (Cliff et al. 1992).

Several criticisms have been raised against the direct use of SMR and p-values or statistical significance. Major criticism of SMR-based approach is based on the influence of population size. Unreliable estimates from areas with low populations mask the true spatial pattern, presenting extreme relative risk estimates which dominate the map. Use of p-values for the significance of the variable under study is considered uninformative. When computing p-values, areas with a modest increase of risk are likely to display extreme values solely due to the high population. This problem has led researchers to favour the development of empirical Bayes estimates for area pattern analysis.

An interesting alternative, which may be less demanding, is the use of stem-and-leaf plots to classify data before area pattern analysis. This approach is more intuitive and easier to use by health professionals, and presents another method of incorporating dynamic graphics into a GIS for use by health professionals. Through the display of distribution data using stem-and-leaf plots, it is possible to select appropriate schemes of classification before the production of choropleth maps.

Another criticism of choropleth maps lies in their limitation in using administrative boundaries for studies of health events. Large areas may dominate smaller ones, where estimates are usually more reliable. To overcome this limitation, two main suggestions have been proposed: use of symbolic representation and map transformation. The former uses circles or frames proportional to the magnitude of the variable (Dunn 1987); the latter involves modification of the boundaries of the regions according to size of the population (Selvin et al. 1988). However, transformed maps are sometimes difficult to interpret, and may conceal clusters of health events for areas with larger populations when they are augmented.

Surface and Contour Patterns
Data of epidemiological or public health interest often occur as spatial information during each of several time epochs. The analytical techniques described previously require the pooling of information in administrative areas with well defined geographic boundaries (e.g., counties, municipalities, and health districts), and the presentation of the spatial process with maps constrained to them. These maps are often unable to capture health problems at the locality or subcounty level. As well, epidemiological variables do not necessarily recognize political boundaries. To overcome the limitation of administrative regions for mapping, surface, and contour pattern analysis presents an alternative by representing the distribution of the health event. The advantage of this spatial analytical technique is that the variable under study is treated as a continuous process throughout the region.

Surface and contour analysis assumes that a health event is a continuous process observed at a set of geographic points, known as sampling points. Using the x and y coordinates of these sampling points, with an associate z value corresponding to the health event, the estimated spatial relative risk is depicted as a three-dimensional map or surface. The contour map, known as an isoline or isopleth, is the projection of the surface in a plane, and corresponds to constant z values of the defined surface.

Although these techniques may overcome the limitations of political boundaries and help in the representation of spatial processes collected as point data, they are not often used by health researchers. This may be due to the very fact that they lose the geopolitical information known to the researcher. One possibility, which is already available in some GIS packages, is the capacity of overlaying the geographic map of the region with one of these analytical techniques.

Temporal Analytic Techniques

Surveillance is usually understood as the continuous systematic collection and analysis of a series of quantitative measurements. The detection and interpretation of changes in the pattern of the constructed time series are very important. The surveillance of diseases, and of other health events, presents an important challenge to public health surveillance systems, since late detection of increases in case occurrence may result in missed opportunities for intervention. Temporal analytic techniques reported here centre on procedures for timely detection of the onset of a potential epidemic period. Other temporal analytic techniques, such as correlation analysis, will not be discussed.

Quality Control Charts
Industrial quality control has developed a series of methods for monitoring. Among them, three major methods appeared in the public health surveillance literature — the Shewhart test, the simple cumulative sum test, and V-mask. These methods are based on a comparison of incoming values from the time series with constant values, usually defined empirically from historical data. The advantages of these methods are that they can provide graphic information, and as such can be incorporated into an information system, helping public health professionals in the surveillance process.

The Shewhart test uses only the last observation, comparing it with a predefined target or expected value. An abnormal case occurrence is declared whenever the absolute difference between the observed and the expected values exceeds a specified threshold. The simple cumulative sum test uses the accumulated deviations between expected and observed values. Here, identification of abnormal events occurs when the absolute cumulative sum exceeds a fixed limiting value. Finally, the V-mask, which is also known as CUSUM method, is based on the construction of a graphic of the cumulated sums of the deviations between observed and expected value. For this method, a V-shaped mask moved over the resulting cumulative time series permits detection when an earlier observation crosses one leg of the mask.

A recent evaluation of the three methods suggests that in addition to the constant values used as an alarm threshold, other parameters of the methods should be provided to facilitate an understanding of the alarms (Frisén 1992). Also, application of quality control charts can be made on suitably transformed data.

Statistical Monitoring
A common measure used by epidemiologists to identify increases in case occurrence of diseases, is the ratio of case numbers at a particular time to past case occurrence using the mean or median. Based on this concept, a monitoring technique has been developed and is currently in use at CDC (Centres for Disease Control and Prevention, USA) (Stroup et al. 1989). Expected values for the current month are computed as the average of data from the corresponding, previous, and subsequent months for the last 5 years. The basic assumption of the method is that the obtained 15 values are independent random variables with the same distribution function. The test statistic is the ratio of the observed value to the constructed average. Confidence intervals are computed and a comparison of actual and expected values is undertaken. A bar plot displays simultaneously the ratios of several notifiable diseases. Deviations from the unit of this ratio indicate a departure from past patterns, and values which exceed the 95% limits are shaded differently.

Although the method is quite simple to compute, it requires the availability of 5 years of past data. Also, problems due to correlate data values reduce the precision of the estimated confidence interval. This latter problem can be modified using some special techniques to estimate the variance (Kafadar and Stroup 1992).

The requirement of a 5-year period precludes the use of this technique in places where a surveillance system is starting, or for new health conditions. One advantage for public health officials, is the simplicity of the method and the straightforward form of presentation for final analysis.

Time Series Analysis
To account for the evolving nature of surveillance data, time series analysis is an alternative for monitoring case occurrence of health events. The common analytical framework uses time series models to forecast expected numbers of cases, followed by comparison with the actual observation. Detection of changes from historical patterns through forecast error uses the difference between the actual and estimated values at each point in time. In contrast to other monitoring schemes, time series methods use the correlation structure of the data at different time intervals in making estimates.

Most attention has been focused on the use of the Box-Jenkins modeling strategy to construct autoregressive integrated moving average (ARIMA) models for specific health variables (Helfenstein 1986; Stroup et al. 1988; Watier and Richardson 1991). The modeling strategy analyzes a long series of values in a stationary mode. Since most health variables of interest are not stationary, the analysts have to resort to preliminary transformations, such as time series differencing or variance-stabilizing to achieve stationarity. After choosing the transformation, the steps of model identification, parameter estimation, and diagnostic checking are performed. Key tools for modeling are the autocorrelation function (ACF) and the partial autocorrelation function (PACF).

Monitoring is usually achieved by computing the forecast error and using one of the quality control charts described previously. This approach is quite complex, demanding a thorough statistical knowledge for its development. It also requires the availability of long historical data. Several health time series are not modeled; even after applying transformations stationarity is not attained.

Another time series approach for forecasting uses the exponential smoothing technique. Here, available data is approximated by a polynomial model and the coefficients are computed as each new observation becomes available. The computed forecast error from this modeling approach behaves as a derivative of the time series, and forms the basis for detecting abnormal pattern occurrence (Nobre and Stroup 1994). This method is less demanding than other time series approaches, requiring less historical data for the initial set up. This method has also been applied in scenarios with several periods with no case occurrence, which usually does not satisfy the assumptions of other modeling techniques. The major constraint of this analytical technique is the empirical process to estimate the parameters.

Time series analysis has already been shown to be quite useful in different contexts for monitoring tasks. Its implementation into an integrated system for use in public health will lead to a better assessment of its impact and utility. It will also open the opportunity for additional studies, such as the influence of climatic and other environmental time series on the occurrence of health events.

Temporal Cluster Analysis
Detection of temporal clusters, understood as a change in the frequency of disease occurrence, is important to stimulate research into the causes, and to encourage the development of preventive strategies. Detection of increases in the rate of occurrence of a disease uses either the time interval of successive events, or the number of events on specified time intervals.

One method for identifying clusters is the scan statistic test (Naus 1965). This method consists of counting the number of cases in each possible time interval of fixed length. The largest number of cases in any such intervals is tested under the null hypothesis that this value is likely to occur in a case of no epidemic. Application of this method involves the assumption of a constant population at risk and a constant detection rate of cases. A modification of the method has been suggested to avoid the restrictive assumptions involved in the scan statistic (Weinstock 1981). Studies of temporal clusters based on the time interval between events have also been described in the literature (Sitter et al. 1990; Gouvea and Nobre 1991). These methods assume that the random time intervals of successive cases form an independent and identically distributed sequence of exponential random variables.

Although these methods are useful for detecting temporal clusters, they are not easily accepted by public health officials, who usually have little knowledge of statistical methods. Thus, incorporation of this type of temporal analytical technique into an integrated information system as a GIS-EPI requires careful planning and evaluation.

Spatio-Temporal Analytic Techniques

Space-time interaction among health events or between health events and environmental variables is as an important component for epidemiological studies and public health surveillance. The bulk of the development in spatio-temporal patterns of health problems has been based on modeling and simulation because of the paucity of available data sets (Marshall 1991). Similarly with time series analysis, the basis of spatio-temporal analytical techniques is the assumption that observed spatial patterns arise from an underlying process. Modeling this underlying spatial processes allows for the study of disease diffusion process, and the estimation of linear spatial transfer functions which best transform a map at time t into that at time t + 1.

One example of a spatio-temporal analysis of the spread of contagious disease is the use of the date of the onset of variola minor in the dwellings of a small town in Brazil (Splaine et al. 1974; Angulo et al. 1979). Different contour maps were produced using the date of introduction of the disease into the dwelling as the dependent variable. The independent variable were the x and y coordinates of the dwelling localization. This type of analysis can be conducted if data at a low spatial scale is available, which may be feasible with an adequately designed GIS-EPI system.

Study of the spread of a disease over large areas has been the subject of a more recent study which used kriging to estimate the underlying spatial process of an influenza-like epidemic in France (Carrat and Valleron 1992). A sequence of contour maps was produced, characterizing the spread of disease in the country. This technique is a simple description of the spatio-temporal evolution of the disease, but it may prove to be quite powerful for establishing control measures.

Another type of spatio-temporal analytic technique involves estimating the probability density of case occurrence of a health event. Use of this method has allowed the description of the spatio-temporal spread of rabies epizootic in central Europe (Sayers et al. 1977). In this work, a contour map of relevant probability levels depicted the wavefront spread of the disease, and line pattern analysis described the limited set of trajectories of the foci's movements across the region. Use of the estimated probability density also allowed the construction of a linear transfer function to estimate maps of the distribution of case occurrence from previous time epochs. This technique involves a strong knowledge of statistical analysis. While producing some interesting results, it requires additional evaluation before application by public health professionals.

Conclusions

This paper has outlined the concepts and functions of spatial and temporal analytical techniques and suggested their relevance for health care. Although these methods offer much potential, their use for health is crucially dependent upon the availability of suitable data. Also, it is fundamental that a GIS-EPI is developed to allow the data and the associated analytic results more accessibility and understandability to public health officials.

GIS may play an important role in the use and analysis of public health data. However, turning the promise into reality entails a multidisciplinary effort to explore the possibilities offered by spatial and temporal analytical techniques to improve our knowledge of public and environmental health. The issue of reliability and validity of the data is also of importance. Special attention should focus on ensuring that geographic information is available in digital form, and that different data sets can be linked accordingly.

Acknowledgements

The International Development Research Centre (IDRC) provides financial support for the Biomedical Engineering Program, PEB/COPPE/UFRJ through the research grant IDRC 92-0228. In addition, F.F.N and M.S.C received additional financial support from the Brazilian Research Council, (CNPq).

References


Flavio Fonseca Nobre in with the Programa de Engenharia Biomédica, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil; Marilia Sá Carvalho is with the Escola Nacional de Saúde Pública, FIOCRUZ, Brazil.
Back to Contents - To next section

This file was created 23 February 1996

Copyright International Development Research Centre. Source: nobre.html