**Part 3 Neutral models**

This is the third in a series on spatial autocorrelation and clusters of health events. The first part presented a framework for analyzing disease clusters that builds on the principles of strong inference. Strong inference involves enumeration of all of the possible explanations of a disease cluster, some may be causal (such as an environmental hazard or exposure linked aetiologically to the health outcome), some may not (such as geographic variation in proportion of cases reported). Each of them may result in spatial autocorrelation in health events, the topic of the second part in this series. These sources of spatial autocorrelation may produce clusters, they also may mask clusters that are attributable to another cause (such as an environmental exposures). How may one account for these multiple causes of a cluster so that causes of interest to the investigator can be identified? This can be thought of as controlling for “nuisance” sources of spatial autocorrelation in the analysis. As an example, how might we account for background variation — the spatial autocorrelation found in disease rates — when a cluster process is absent? This of course is what we want to use as the “null hypothesis” for a statistical test for clustering, since random spatial patterns are rarely, if ever, found in the real world.

There are three approaches.

For example, rates often are age-adjusted, and this removes spatial autocorrelation due to geographic variation in age of the underlying population. This may be somewhat involved when multiple causes of “nuisance” spatial autocorrelation are present, and is not possible when all of the sources of background variation have not been identified.**Standardize the disease rate.**(e.g. regression, Bayesian etc) may be constructed with the nuisance sources of spatial structure on the right hand-side of the equation (as predictors). One then analyzes the regression residuals (e.g. using the local Moran test in SpaceStat, to identify clusters of excess risk. Again, this requires detailed knowledge of and data on the predictors.**Formal mathematical models**may be used to account for background variation in disease rates, as described in this blog. Neutral models do not require that all of the sources of background variability be identified, and they can be incorporated into inferential tests for clustering, and support the cluster analysis framework of Strong Inference.**Neutral models**

**Role of neutral models**

The previous blog provided an overview of some of the sources of spatial autocorrelation in health events. When exploring disease patterns and clusters many of these sources of geographic variation may not be of direct interest, for example, we often may wish to account for spatial heterogeneity in population density when searching for the signature of causative exposures underlying clusters of disease. Here the idea is to search for clusters of health events *above and beyond* that attributable to geographic variation in population density. This concept can apply to any source of geographic pattern that may not be of direct interest; those members, for example of the set of possible explanations described earlier under “Strong inference” and enumerated in “Sources of spatial autocorrelation in health events”.

When clustering health events one then incorporates geographic variability in covariates and other factors not considered to be of interest into the null hypothesis. Mechanistically, this usually is accomplished using approximate randomization that includes observed variation patterns in those factors not of direct interest into the null spatial model (Waller and Jacquez 1995). Models that accomplish this have been referred to as “neutral” rather than “null” models, to capture the idea that they account for more than just “complete spatial randomness”. Neutral models thus correspond to plausible system states that can be used as a reasonable null hypothesis (e.g. “background variation”) in disease cluster tests. The problem then is to identify spatial patterns not incorporated into the neutral model, enabling, for example, the detection of clusters *above and beyond* background or regional variation in the risk of developing disease.

A typology of neutral models that account for factors often encountered in analyses of health events defines neutral models type I-VI (Goovaerts and Jacquez 2004). These neutral models are realistic in that they account for the spatial autocorrelation, non-uniform risk, and spatially heterogeneous population sizes that may be present in the absence of the cluster process. Model I is Complete Spatial Randomness (CSR), that is still widely used in health analysis even though it usually does not correspond to any plausible state of the system being studied. Model II reproduces the spatial autocorrelation that may be present in the observed data. Model III incorporates non-uniform variability in the underlying risk that may be attributable to risk factors and covariates that are not of direct interest. Models IV through VI account for the impact of population size and variability on the stability of observed rates, and are used to address the small numbers problem.

Neutral models thus play a critical role in scientific inference in disease pattern analysis since they allow one to systematically incorporate different sources of geographic variation, including spatial autocorrelation, into the hypotheses being evaluated. In the framework of Strong inference, one conducts a series of statistical analyses systematically evaluating each of the hypotheses in the set of alternative explanations for the observed spatial patterns of health events. These are each incorporated into the neutral model of a given spatial cluster test, and if the test is significant that hypothesis is rejected; if it is not significant that hypothesis is retained in the set of plausible explanations for the observed spatial pattern.

As noted earlier, an alternative mechanism when knowledge of the system is sufficient is to construct more formal, detailed models using regression, geostatistical, and other modeling approaches. The variability captured by the model is then attributable to the predictor variables, and clustering may then be applied to the regression residuals to quantify spatial pattern not captured by the model itself. SpaceStat provides both cluster analysis and modeling techniques that are applied automatically to time-dynamic data, enabling more accurate inferences for health events.

**References**

Goovaerts, P. and G. M. Jacquez (2004). “Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York.” Int J Health Geogr **3**(1): 14.

Waller, L. A. and G. M. Jacquez (1995). “Disease models implicit in statistical tests of disease clustering.” Epidemiology **6**(6): 584-590.