GeoMed

Sample data source: Case Studies in Biometry, 1994, N. Lange, L. Ryan, L. Billard, D. Brillinger, L. Conquest, and J. Greenhouse (eds.), John Wiley & Sons, Inc., New York (http://lib.stat.cmu.edu/datasets/csb/.index.html)

Chapter 1: Spatial Pattern Analyses to Detect Rare Disease Clusters, L.A. Waller, B.W. Turnbull, L.C. Clark, and P. Nasca

This data set consists of 592 Leukemia cases (1978-1982) diagnosed in an 8 county region in upstate New York and reported to the state Cancer Registry. The 8 counties were divided into 790 subregions for which 1980 Census counts were available to estimate population at risk. There are 5 columns of data for each of the 790 rows. The first column is a label identifying the subregion. The second and third columns are x- and y- coordinates, indicating the location of the centroid of the subregion. The forth and fifth columns were swapped to meet the input data file format required by GeoMed. The forth column represents the (fractional) number of cases, and the fifth column represents the population-at-risk size for each subregion.

Try the GeoMed statistical advisor (double click on GeoMed.chm in the directory containing the GeoMed application).  It will provide guidance for selecting a method appropriate for your investigation and data from those methods available in the GeoMed software.

These data were collected retrospectively in order to investigate the spatial pattern of leukemia occurrence in the five-year period 1978-1982.  This is in contrast to data collected and monitored in an ongoing fashion with the goal of detecting clustering soon after it occurs (surveillance).  These data are aggregated over the entire study period, so we are interested in investigating spatial clustering.  Keep in mind that this data set consists of case and population-at-risk counts aggregated by subregion.

Work your way through the advisor to a global, spatial method.

UMich students: due to technical difficulties, you may get a missing page error at the point at which you arrive at the method of choice and select it. Never fear: you can get the information from Method list in the contents box.

You can read up a bit on the method (look in the index to find general information) and then fire up the GeoMed software (GeoMed.exe - at the dialog box titled "New", simply hit okay).

 Run the global spatial method by selecting it from the Statistical Analysis menu option.  You will be prompted to open a data file for analysis, open the file titled leukemia.txt.  You will be presented with a data entry dialog box.  Try a cluster cutoff size = 6.   Run 999 Monte Carlo randomization runs.

Run the analysis by clicking the OK button

Check the value of the test statistic for global clustering (R) against the values generated from Monte Carlo simulation runs. Select the menu option Graphical display > MC Frequency Distribution.

Check the map of the subregion coordinates and position of local clusters.  Select the menu option Graphical display > Map.

Try using the Layers list.  To view only the circular cluster buffers, unselect the transparency check mark for the "cluster locations" layer.  To view only the subregion coordinates, move the "Region coordinates" layer to the top and unselect its transparency check mark.

Try using the Tools.  Click the magnifying glass and "left" click the map to zoom in, "right" click to zoom out.  Move the scroll bars to pan around

Run another analysis with a new cluster cutoff size = 8.

Return to statistical advisor and find an appropriate (new) local spatial method for the same data set, and run the method in GeoMed.  Set the population radius = 5,000.

Check the Monte Carlo frequency distribution against the three highest test statistic values.  Check the map.  To view only the location of the cluster with the largest test statistic, move it to the top of the list followed by the Region coordinates, and unselect the transparency check mark for the Region coordinates

Run another analysis with a population radius = 10,000.  It's hard to see the locations of the clusters with the 2nd and 3rd highest test statistics, with the originally presented Layers list order, (1st largest, followed by 2nd and then 3rd largest, and lastly the Region coordinates) unselect the last two check marks.

You are now interested in examining whether or not disease cases tend to cluster around particular locations.  There is a hazardous waste cite located at the x,y coordinate (-0.14, -67.19).

Return to statistical advisor and find an appropriate focused spatial method for the same data set simply hypothesizing that risk increases with closer proximity to the focus, we don't assume much else regarding the disease pattern surrounding the putative exposure source.  Run the method in GeoMed.

To see the focus location on the map, with the focus as the top layer, unselect its check mark.  Run 2 more analyses with coordinates

(-4.47, -67.65) and (11.98, -71.61)

Multiple Comparisons:

We've run multiple analyses for each of the Besag & Newell, Turnbull , and Score analyses

Select the menu option Statistical analysis > Multiple Comparisons…

Run adjustments for Besag & Newell, Turnbull (all statistics), and Score

 

Return to statistical advisor and find an appropriate focused spatial method for the same data set, but assume you have a priori knowledge of how risk varies with distance from the focus.

Use the original focus coordinate = (-0.14, -67.19), with relative risk model parameters Beta = 10 and Phi = 2.  Hit the Visualize button (and move the dialog window out of the way as best you can).  Run the analysis and close all function scatter plots in order to see the results

 

Data set source: provided by Peter Diggle from his web site - the incinerator data

This data was collected retrospectively to investigate the possible association between the spatial distribution of cases of cancer of the larynx and the location of the now-disused industrial incinerator (354.5, 413.6).  The spatial locations of 'control' subjects (lung cancer cases) were also collected and included in the data set.

Return to the statistical advisor to find an appropriate method.  Run the method with the file diggle.txt.  Try relative risk model parameters Alpha = 20 and Beta = .5.  A plot of the fitted raised incidence model (with the data plotted as cases (1) and controls (0)) is presented as part of the output. Have a look at the map of the data cases (red), controls (blue), and the focus (black x).

 

Data set source: SpaceStat Data: Africa (http://www.spacestat.com/data.htm)

Data for 42 African countries (1966-1978) was provided by the Conflict and Peace Data Bank. An ascii text file with a single column, the total conflict index, will be used in the analysis. An ArcView Shape file will be used to determine the boundaries of the African countries. The associated .dbf file has a name column used to uniquely identify all countries to be analyzed. These countries are ordered in correspondence with the total conflict indices in the text file.

Return to the statistical advisor to find an appropriate local spatial method for these retrospective data.  The unit of analysis is not exactly a disease rate, but it is similar enough to pretend that is what we want to analyze.  Run the local spatial method in GeoMed.

You will first be prompted to open the ascii text data file for analysis, open the file titled afcon.txt.  You will next be prompted to open a shape file with boundary information, open the file titled afcon.shp.

Run 999 Monte Carlo Randomization runs.  The Map provides the region boundaries, eventually it will provide more information.

 

Data Set source: SaTScan Data: New Mexico (made available with the downloadable SaTScan software at http://www.dcpc.nci.nih.gov/BB/SaTScan.html)

Cases of brain cancer in 43 New Mexico counties from 1973 to 1991 are listed. Census population counts are provided for the years 1973, 1982, and 1991. All covariate information has been removed.

The following files are used during analysis:

  1. Coordinates file with 3 columns: county-label centroid-x-coordinate centroid-y-coordinate
  2. Case file with 3 columns: county-label year case-count
  3. Population file with 3 columns: county-label census-year population-count

Return to the statistical advisor.  Here we have data with both spatial and temporal references, so lets take full advantage of this information.  Find an appropriate spatio-temporal method and then run it.

You will first be prompted to open the data file with study region centroid values, open the file titled nmCoord.txt.  You will next be prompted to open a data file with case data, open the file titled nmCase.txt.  You will last be prompted to open a data file with population data, open the file titled nmPop.txt.

Run 999 Monte Carlo Randomization runs. This could take awhile: if your patience is running out, hit the cancel button and the processing will stop, using as many Monte Carlo runs as have been run to that point.