Part 1 Strong Inference
The Centers for Disease Control as well as state and local health agencies use information on clusters of health events to respond to cluster allegations brought forward by a concerned public; identify impacted local populations (where are communities with excess childhood leukemia?); guide interventions (where are clusters of late stage diagnosis of breast cancer – we may want a screening facility there); and for program evaluation (are clusters of excess colorectal cancer disappearing in response to my intervention?).
Clustering of health events are also used to increase our understanding of disease aetiology, and thus plays a role in spatial epidemiology and medical geography. We almost always wish to increase our understanding of the causes underlying clusters of health events. Why are they there, and what has caused them?
To address this question we require a reasonable and sound framework for analyzing clusters of health events, and an understanding of the causes that plausibly might explain disease clusters. That is the motivation for this 2-part blog. Part 1 will proposes an analytical framework, called Strong Inference, to guide cluster investigation. Part 2 then enumerates the possible sources of spatial autocorrelation in health events – those factors that could give rise to health event clustering. This blog contains excerpts from an essay I wrote for the Springer Handbook of Regional Science.
Scientific inference from patterns of health events
Health event clusters may loosely be defined as statistically significant excesses of health events in space, in time, or in space time. There also is space-time interaction, as when nearby health events occur at about the same time. Cluster existence, location and timing can inform decisions regarding different questions, such as:
- Is an observed pattern of health events statistically unusual? (Is apparent clustering real?)
- Where are populations with elevated disease rates? (Where are local excesses found?)
- Are areas with elevated health events found in proximity to geographic features thought to be associated with disease causality? (Is there focused clustering about pollutant sources?)
- Is the observed spatial pattern of health events consistent with certain hypothesized disease processes, and not consistent with others (what is the underlying cause)?
- Are there reasonable new hypotheses that might explain the observed disease patterns (what is the best explanation for the cluster)?
Several of these questions can be addressed using an inferential process where plausible generating processes for an observed pattern are considered and then excluded. This can be done in a haphazard fashion, but it usually is best to systematically enumerate the set of plausible hypotheses that might give rise to an observed pattern of health events, and to then exclude members of this set by conducting a series of experiments that may include statistical tests and models for evaluating space-time disease patterns. This inferential framework seeks to accomplish a mapping of health event patterns to the spatial processes that might give rise to them, and is called Strong inference.
Strong inference for health events
In 1964, Platt coined the term “Strong inference” (Platt 1964) to describe a useful construct for systematically evaluating explanatory hypotheses that plausibly might explain observed patterns in a data set. It involves first, enumeration of the explanatory hypotheses that might give rise to the pattern; second, formulation of falsifiable predictions that can be used to systematically test each of these hypotheses; third, undertaking the tests of predictions; and fourth, winnowing out the hypotheses whose corresponding predictions are found to be false. The remaining hypotheses then must include, or together explain, the observed data patterns. The initial set of explanatory hypotheses may be expanded as the experiments are conducted. What is key is that the predictions framed for each hypothesis be falsifiable (e.g. can be tested using, for example, a statistic for spatial clustering), and that the set of explanatory hypotheses be properly framed.
Sources of spatial autocorrelation in health events
This raises a very important question. What are the sources of spatial autocorrelation in health events? These may need to be included in the set of explanatory hypotheses for an observed pattern, and include spatial autocorrelation in underlying risk factors, covariates, reporting, diagnosis, health care policies, physician behaviors, and interpolation autocorrelation, among others. This is by no means an exhaustive list, but includes factors that likely should be considered in many spatial analyses of health events. The sources of spatial autocorrelation in health events is the topic of blog 2 in this series.