Synthetic Cancer variables and the construction and testing of synthetic risk maps

Jacquez, G. M. and L. I. Kheifets. 1993. Synthetic cancer variables and the construction and testing of synthetic risk maps. Statistics in Medicine 12:1931-1942.

Abstract:

Cancer cluster investigations are usually univariate in nature; they focus on a particular cancer, such as leukaemia, and attempt to determine whether excess risk is associated with a suspected cancer-causing agent. Although several causes of death (such as leukaemia, lymphoma, Hodgkin's) may be considered, the approach is univariate because the causes of death are analysed sequentially and independently of one another. This approach is consistent with a one-cause one-effect model. Rarely, however, is the action of a carcinogen manifested at only one body site, and correlations among causes of death are the norm rather than the exception. A multiple effects model is therefore appropriate, and the multivariate nature of cancer should be exploited when exploring geographic pattern in cancer risks. This paper describes such an approach. We construct maps based on a principal components analysis of cancer mortality rates from different geographic areas. The resulting principal components are called synthetic cancer variables (SCVs), and maps of the SCV scores are synthetic risk maps (SRMs). These maps quantify geographic variation in cancer risk at several body sites simultaneously, and may be analysed for (1) spatial structure and (2) geographic association with potential risk factors. As an example, we use synthetic risk maps to determine whether high-risk counties in Illinois cluster near nuclear facilities. Much work remains to be done, but synthetic cancer risk maps appear to be a useful tool for quantifying geographic pattern and multivariate structure in cancer mortality.

This paper illustrates the application of a carefully described method to cancer data in Illinois to attempt to answer the question: "Did nuclear power plants cause increased incidence of cancer in Illinois over the period from 1970-1979?", using data on cancer mortality obtained from an EPA report for the period 1970-1979.

Rather than proceed cancer by cancer, the authors recognized the strong correlation between types of cancer, and so used PCA (principal component analysis) to create synthetic cancer variables (combinations of cancers), and studied those instead. They distilled the 30 original cancers into 3 SCVs.

The method produced maps showing what appeared to be clustering in the first (and hence strongest) SCV about the power plants. Rather than simply conclude that, indeed, nuclear power plants caused the increase, the authors solicited information on cancers from a period prior to the operation of the plants, and analysed that: they discover the same pattern for the first SCV.

This study illustrates the danger of misapplication of visualization techniques (resulting in what Jacquez has called the "Gee Whiz" effect - wherein a map pattern leads to a hypothesis on which action is taken). It also illustrates Jacquez's constant focus on Karl Popper's theories of scientific inference: a hypothesis or theory is only proposed in order to be tested, tested, tested; as long as it holds up under testing, the theory may be retained, but if it falls under testing then it must be considered discredited.


Website maintained by Andy Long. Comments appreciated.