Synthetic Cancer variables and the construction and testing of synthetic risk maps
Jacquez, G. M. and L. I. Kheifets. 1993. Synthetic cancer
variables and the construction and testing of synthetic risk maps.
Statistics in Medicine 12:1931-1942.
- Abstract:
- Cancer cluster investigations are usually univariate in nature; they
focus on a particular cancer, such as leukaemia, and attempt to determine
whether excess risk is associated with a suspected cancer-causing
agent. Although several causes of death (such as leukaemia, lymphoma,
Hodgkin's) may be considered, the approach is univariate because the causes
of death are analysed sequentially and independently of one another. This
approach is consistent with a one-cause one-effect model. Rarely, however,
is the action of a carcinogen manifested at only one body site, and
correlations among causes of death are the norm rather than the exception. A
multiple effects model is therefore appropriate, and the multivariate nature
of cancer should be exploited when exploring geographic pattern in cancer
risks. This paper describes such an approach. We construct maps based on a
principal components analysis of cancer mortality rates from different
geographic areas. The resulting principal components are called synthetic
cancer variables (SCVs), and maps of the SCV scores are synthetic risk maps
(SRMs). These maps quantify geographic variation in cancer risk at several
body sites simultaneously, and may be analysed for (1) spatial structure and
(2) geographic association with potential risk factors. As an example, we
use synthetic risk maps to determine whether high-risk counties in Illinois
cluster near nuclear facilities. Much work remains to be done, but synthetic
cancer risk maps appear to be a useful tool for quantifying geographic
pattern and multivariate structure in cancer mortality.
This paper illustrates the application of a carefully described
method to cancer data in Illinois to attempt to answer the question: "Did
nuclear power plants cause increased incidence of cancer in Illinois over
the period from 1970-1979?", using data on cancer mortality obtained from an
EPA report for the period 1970-1979.
Rather than proceed cancer by cancer, the authors recognized the
strong correlation between types of cancer, and so used PCA (principal
component analysis) to create synthetic cancer variables
(combinations of cancers), and studied those instead. They distilled the 30
original cancers into 3 SCVs.
The method produced maps showing what appeared to be clustering in
the first (and hence strongest) SCV about the power plants. Rather than
simply conclude that, indeed, nuclear power plants caused the increase, the
authors solicited information on cancers from a period prior to the
operation of the plants, and analysed that: they discover the same
pattern for the first SCV.
This study illustrates the danger of misapplication of visualization
techniques (resulting in what Jacquez has called the "Gee Whiz" effect -
wherein a map pattern leads to a hypothesis on which action is taken). It
also illustrates Jacquez's constant focus on Karl Popper's theories of
scientific inference: a hypothesis or theory is only proposed in order to be
tested, tested, tested; as long as it holds up under testing, the theory may
be retained, but if it falls under testing then it must be considered
discredited.
Website maintained by Andy Long.
Comments appreciated.