Recently I gave a talk at the GIScience 2012 conference held in Columbus, Ohio. I participated in a panel on space-time research in geographic information science that was organized by May Yuan and Mei-Po Kwan. Panelists were asked to speak on a variety of topics, including future research directions. Originally, I was going to talk about the specification and modeling of temporal lags, as these are a key piece of dynamic geographic systems. But after thinking about trends in “big data” in biology, the environmental sciences, and our understanding of human behaviors, I decided instead to talk about something I call “Genetic GIS” (Figure 1).
Figure 1. Genetic GIS. Tying together “omics”, environment, behavior and health.
Genetic GIS recognizes that the major determinants of human health include behavior, which mediates health-related activities including eating, drinking, breathing, and dermal exposure routes, as well as exercise and risk-related behaviors; environment, where an individual’s location over their daily activity space and life course determines their ambient environment; and genetics, loosely defined here to include the “-omics” and other aspects of an individual’s biological makeup that contribute to disease susceptibility and expression. This paradigm has been around for a while, what has changed is the advent of place-based genetic and “-omics” data that make possible the use of a GIS approach, focused on individuals as well as population-level measures, to better understand how genetics, behavior and environment jointly determine human health outcomes. We’ll consider advances in genetic data later in this blog.
For now let’s conduct simple thought experiments (what Einstein called “Gedanken” experiments) to illustrate the approach. We start with Charles Darwin, who used observations on place-based phenotypic variation in birds, snails, plants and other organisms as the basis for his formulation of the evolution of the species. The phenotype is the expression of the genotype – how organisms actually appear. The first instance of a tree illustration for representing the descent and origin of different species was recorded in his diary from the voyage of the ship Beagle, in which he hypothesized adaptive radiation (descent of a suite of different species from a common ancestor) in finches living on different islands (Figure 2). The shape of a finch beak from different islands reflected adaptation to the different food sources available, and the reproductive success of different behaviors adapted to the different food sources. Hence his annotation “I think” was based on observations of the finch phenome on the different islands, and how it related to local environments (food sources) and adaptive behaviors. This is an example of the genetic GIS framework, but without the focus on human health.
Figure 2. Observations on phenotypic variation on different islands led to Darwin’s demonstration of adaptive radiation in finches.
In 1858, John Snow stopped the spread of Cholera in the vicinity of the Broad Street pump in London by removing the pump handle. Dr. Snow used his observations on use of the pump (drinking water behavior), proximity to the pump (environment) and health outcomes (cholera deaths) to surmise the action of some agent in water from the Broad street pump that increased cholera risk. At that time the germ-based theory of disease was nascent at best, and this may be viewed as an example of genetic GIS that relied primarily on the environment-behavior dimensions.
The global eradication of small pox in the 1980’s is a major public health success that used “shoe leather epidemiology” to identify pockets of infection, as accomplished by a small army of public health workers who vaccinated a large portion of the world’s population. Readers born before the 1970’s have the tell-tale small pox vaccination scar, most likely on the upper arm. Here vaccination (behavior) targeted at local populations at risk from small pox (environment) resulted in the eradication of the disease. An immenent public health success is the elimination of guinea worm disease, a highly painful and debilitating parasitic infection (Figure 3, top) with 3.5 million cases in 1986. The disease is spread through the consumption of water where the early life stage of the parasite resides. Once ingested, the parasite migrates to different sites in the body of the host, later to emerge as an adult that may be a foot or more in length. Emergence is a slow and painful process, and the patient is prone to infection and complications. The Carter Foundation developed and distributed sip pipes (Figure 3, bottom) that filter out the parasite, distributed these to people in endemic areas (environment), and taught them how to use the devices (behavior). There were 1,058 cases in 2011, and Guinea worm is expected to be eradicated by 2015. Again, these examples of infectious and parasitic diseases rely on the environment-behavior dimension, and not on genetics.
Figure 3. Guinea worm exiting the body through the foot (top); young herder using a sip pipe to filter out the guinea worm parasite.
With the advent of antibiotics and public health infrastructures that safeguard water and food supplies, the largest source of mortality in industrialized nations has shifted from childhood and infectious diseases to chronic diseases such as cancer. In the U.S. pancreatic cancer is projected to become the 2nd leading cause of cancer mortality by 2020 (Figure 4), but as yet is poorly understood. Cures for the most common forms of pancreatic cancer do not exist, and it typically is not diagnosed until it is well advanced. As a result, the majority of patients die within a year of diagnosis. Unlike the examples for cholera, small pox and guinea worm, advances for chronic disease require a better understanding of the interplay between genetics, behavior and environment, and this is where Genetic GIS comes into its own.
Figure 4. Pancreatic cancer is projected to become the 2nd leading cause of cancer death in the United States by 2020.
But until very recently genetic data has been largely unavailable, and has not been geographically-referenced, a necessary component since place in genetic GIS ties together genetics, environment and behavior. Later we’ll touch on the new era of place-basic “-omics” data. Let’s now consider the state of the scientific domains that will be supported by genetic GIS, starting first with environmental epidemiology.
Figure 5. The field of ecogeographic epidemiology uses geographically referenced data on disease, environment and genetics to better understand the multifactorial basis of human health outcomes. From Sloan et al (2009).
In 2009, Chantel Sloan and colleagues defined the emerging field of ecogeographic environmental epidemiology (Sloan, Duell et al. 2009), illustrated in Figure 5. Here human health outcomes are seen as the result of interactions between genetics and environment, including features of the landscape. Genetic GIS makes explicit the behavioral dimension, especially those that mediate human exposures (including oral, respiratory, dermal and radiation) and modify disease risks.
Figure 6. The GIS paradigm of the map layers can be applied to the integration and visualization of genome data. From Dolan et al. 2006.
Dolan et al. observed that the layer-cake paradigm of GIS can be applied to bioinformatic data defining the “-omics” of individuals. As illustrated in Figure 6, the individual, rather than place, is used as the referent for linking data on genetics, the chromosome space, measures of gene expression including the regulome (DNA regions that control gene expression), and highly conserved regions that are often associated with critical biological functions such as homeostasis. Consider genetic variation at the population level, and how this may contribute to observed disease outcomes. Zhang et al. postulated that population differences in gene expression could contribute to some of the observed differences in susceptibility to common diseases and response to drug treatments, and proposed an approach for evaluating the contributions of genetic variation and nongenetic factors to population differences in gene expression (Zhang, Duan et al. 2008). Recognizing the advent of large volumes of place-based genetic data, Davies et al called for the establishment of an international network of genomic observatories (Davies, Meyer et al. 2012). They focused on landscape-level genetic variability in plants and animals and the digital characterization of whole ecosystems, including time-series of “omics” data. But large volumes of place-based human genetic data are now becoming available as well, and studies such as these provide a framework for how the genetic and “-omic” dimensions can be brought into a genetic GIS.
What analytical frameworks exist for relating the geographic variability in the genetic, environmental and behavioral dimensions to human health outcomes? Meliker and others have pioneered new approaches that use space-time GIS to reconstruct life-time exposures to time-varying place-based environmental factors such as arsenic concentrations in drinking water, air-borne toxics and infectious agents. This uses the locations and activity spaces of individuals over their life course to link time-dynamic information on the environment with the human behaviors (e.g. occupation, food and water consumption behaviors) that mediate exposures (Meliker, Slotnick et al. 2010; Meliker, Goovaerts et al. 2011). This example illustrates how exposure reconstruction and the sensitivity analysis of exposure metrics may be accomplished by exploiting the behavior-environment dimensions of genetic GIS. Including information on human mobility histories and place-based exposures within the context of case-control studies supports scientific inference structures that are based on proven epidemiological study designs, as demonstrated by newer approaches such as Q-statistics and generalized additive models (Jacquez and Meliker 2009; Vieira, Webster et al. 2009). Studies such as these illustrate how the environmental and behavioral dimensions of genetic GIS may be used within the framework of proven study designs to evaluate and identify significant risk factors for specific diseases.
How might the genetic dimension contribute to our understanding of human health? When individual-level genetic data is available it could be used within the context of epidemiological studies (such as the case-control methods cited above) as additional individual-level attributes to assess relative contributions to disease risk of genetic, environmental and behavioral factors, and their interactions (e.g. gene-environment).
Figure 7. Spatial correlogram of DNA diversity illustrating the signature of a genetic cline consistent with spatially structured selection or migration. From Bertorelle and Barbujani 1995).
At the population level, geographic variation in genetics has been directly linked to micro-evolutionary processes such as isolation by distance, migration, and selection. The null expectations for genetic variance are defined by Hardy-Weinberg equilibrium, and spatial correlograms have been used as a diagnostic since the 1980’s (Sokal, Jacquez et al. 1989). Spatial autocorrelation statistics for summarizing spatial patterns of DNA diversity have been applied to allele frequencies, restriction fragment length polymorphism, gene sequence and other data. These statistics are a useful tool for exploring the genetic structure of a population to suggest hypotheses regarding evolutionary processes that shaped the observed patterns (Bertorelle and Barbujani 1995).
A body of work remains to be accomplished that will develop the theory and inference structure of genetic GIS. How can spatial and spatio-temporal structure in genetics, environment, behavior and health outcomes inform our understanding of human diseases? Sloan et al (In Press) suggest that patterns of local indictors of spatial and temporal clustering can be used to implicate specific disease processes (Sloan, Jacquez et al. 2012). Although a more complex problem than relating signatures on spatial correlograms of genetic data to micro-evolutionary processes, the inferential framework of Sloan et al suggests how space-time variation in environment, behavior, genetics and health outcomes may be used to evaluate alternative hypotheses of disease causation.
So what is the promise of genetic GIS? Three benefits seem apparent. First, it provides a comprehensive model of human health and its determinants including genetic, environmental and behavioral dimensions. Second, it readily bridges individual- to population-level scales by using place to tie data on individuals to their local and regional settings. Finally, it enables studies of human disease that would otherwise be impossible or extremely difficult. Readily supporting individual-level exposure reconstruction over the life course is but one example.
What might a research agenda for genetic GIS look like? First, while the underpinnings of genetic GIS are mostly in place (as touched on by the preceding paragraphs), the theory, specific methods and science of genetic GIS remain to be developed. The ontology linking the dimensions of genetics, environment and behavior needs to be complete, as do the inferential frameworks for relating patterns in these dimensions to specific diseases. Second, technologies for genetic GIS need to be put in place. Most of the components have already been developed (e.g. space-time GIS); a critical task will be establishing appropriate metadata and specific analytical techniques. For example, there are a large number of techniques for calculating genetic distances; these need to be integrated into a GIS framework to support, for example, the calculation of genetic correlograms and the assessment of space-time dependencies of genetics with geographic, environmental and behavioral distances. Third, transformative applications that illustrate the use and unique benefits of the genetic GIS approach need to be developed and disseminated.
Bertorelle, G. and G. Barbujani (1995). “Analysis of DNA diversity by spatial autocorrelation.” Genetics 140(2): 811-819.
Davies, N., C. Meyer, et al. (2012). “A call for an international network of genomic observatories (GOs).” GigaScience 1(1): 5.
Jacquez, G. M. and J. R. Meliker (2009). Case-Control Clustering for Mobile Populations. The SAGE Handbook of Spatial Analysis. S. Fotheringham and P. Rogerson, Sage Publications.
Meliker, J., P. Goovaerts, et al. (2011). ” Incorporating individual-level distributions of exposure error in epidemiologic analyses: An example using arsenic in drinking water and bladder cancer. .” Annals of Epidemiology (in Press).
Meliker, J., M. Slotnick, et al. (2010). “Lifetime Exposure to Arsenic in Drinking Water and Bladder Cancer: A Population-Based Case-Control Study in Michigan.” Cancer Causes and Control 21: 745-757.
Sloan, C. D., E. J. Duell, et al. (2009). “Ecogeographic genetic epidemiology.” Genetic Epidemiology 33(4): 281-289.
Sloan, C. D., G. M. Jacquez, et al. (2012). “Performance of cancer cluster Q-statistics for case-control residential histories.” Spatial and SpatioTemporal Epidemiology In Press.
Sokal, R. R., G. M. Jacquez, et al. (1989). “Spatial autocorrelation analysis of migration and selection.” Genetics 121(4): 845-855.
Vieira, V., T. Webster, et al. (2009). “Spatial analysis of bladder, kidney, and pancreatic cancer on upper Cape Cod: an application of generalized additive models to case-control data.” Environmental Health 8(3).
Zhang, W., S. Duan, et al. (2008). “Evaluation of Genetic Variation Contributing to Differences in Gene Expression between Populations.” American journal of human genetics 82(3): 631-640.