Menu
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact
  • 01Research
    • Research Interests
    • Current Projects
      • Current Projects – METRIC Software
      • Current Projects – Non-destructive Detection of Pipeline Properties
      • Current Projects – Cryptography for confidentiality protection
      • Current Projects – Geostatistical Boundary Analysis
      • Current Projects – I-HEAT
      • Current Projects – Geostatistical software for space-time interpolation
    • Completed Projects
      • Completed Projects – Boundary Analysis
      • Completed Projects – Modeling of College Drinking
      • Completed Projects – Cancer Cluster Morphology
      • Completed Projects – Geostatistical software for the analysis of individual-level epidemiologic data
      • Completed Projects – Case-only Cancer Clustering for Mobile Populations
      • Completed Projects – Contextual Mapping of Space-Time Cancer Data
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
    • SpaceStat
    • SpaceStat Help & Tutorials
    • Overview
    • ClusterSeer
      • ClusterSeer Help and Tutorials
      • Overview
    • BoundarySeer
      • BoundarySeer Help and Tutorials
      • Overview
    • Support
    • Network License
    • Purchase
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact
  • 01Research
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
Subnav Menu

health outcomes

BioMeanings Update January 2015

1.12.15

Two proposal tips

We write a fair number of research proposals, with most of them going to the National Institutes of Health, and the Centers for Disease Control and surveillance. We also submit proposals to the National Science Foundation and NASA. For those researchers who are new to a proposal writing role here are a couple tips.

One of the first things to do after identifying the next “Big Idea” is to assess its potential viability and impact. Two items top the list. The first is the priority in terms of disease targeted; it is much easier to obtain funding for a disease with a large human and economic impact. One resource that may be of use is the global burden of disease, and of years of life lost (YYL). This article, published recently in The Lancet provides estimates from around the globe for both chronic and infectious disease burdens.

Second, if you are looking for funding in the U.S. it is important to consider what the funding organization’s budget and plan are for the near term. Remember the reviewers may not even be aware of the funding organizations research priorities. For NIH, the reviewers are external, and it is good proposal practice to remind them of the relevant NIH research priorities. A good portion of BioMedware’s research portfolio has been funded by the National Cancer Institute. Our 2015 proposals likely will incorporate information from NCI’s 2016 budget proposal and plan, which may be found here.

In summary, the message is to (1) seek to address important diseases, where importance is defined by human and economic costs; and (2) go after those proposal ideas that are highly relevant to the funding organization’s mission and budget plan.

Save the Date: April 13-14, 2015

GIS for Community Impact: From Technology to Translation

Organized by The Cancer Prevention Institute of California and Zero Breast Cancer
Location: The California Endowment, 1111 Broadway, Oakland, CA
Geoffrey Jacquez, PhD, President of BioMedware, and Keynote speaker is conducting the preconference workshop on April 13th, titled “Space Time Analysis Using SpaceStat”.

In the news: SpaceStat for spatial analysis

Recently published articles demonstrating how SpaceStat’s rigorous, statistical analysis methods are being utilized by researchers.

 

Geographical Disparities of Lung Cancer Mortality Centered on Central Appalachia

Timothy S. Hare, Chad Wells, Nicole Johnson; Morehead State University, Morehead, KY, USA
International Journal of Applied Geospatial Research, 5(4), 35-53, October-December 2014 35

“…We use ESRI’s ArcGIS for data processing and visualization and BioMedware’s SpaceStat for EDA, ESDA, OLS regression, spatial regression, and GWR. SpaceStat is a set of software tools for conducting a variety of spatial statisti¬cal analysis techniques (BioMedware, 2012). It supports dynamic and interactive analysis of linked tables, charts, and maps. SpaceStat calculates probability values (p-values) for observed test statistics by comparing them to their null distributions, which estimates the likelihood of the observed values in comparison with the null distribution. The spatial weights matrix used is queen’s case contiguity.”

Download a 14-day Free Trial of SpaceStat

The RUSLE erosion index as a proxy indicator for debris flow susceptibility

Alessandro Zini, Sergio Grauso, Vladimiro Verrubbi, Luca Falconi, Gabriele Leoni, Claudio Puglisi
Landslides, DOI 10.1007/s10346-014-0515-8
Received: 25 September 2013; Accepted: 14 August 2014

Background. Debris flows represent dangerous occurrences in many parts of the world. Several disasters are documented due to this type of fast-moving landslides, therefore, natural-hazard assessment of debris flows is crucial for safety of life and property. In this paper we investigated the effectiveness of the soil erosion index as debris flows susceptibility indicator. The relation between the erosion index, which was assessed by means of the Revised Universal Soil Loss Equation (RUSLE) model, and the inventory of debris flows that have occurred in an area in Sicily was investigated.

Method. To this aim, a geographically weighted logistic regression analysis (GWR) was carried out making use of SpaceStat. While traditional logistic regression aims to obtain an average parameter estimate over the whole territory under observation, the GWR attempts to gain point-to-point estimates. Moreover, the GWR technique may provide a better fit with experimental data.

Results. The results of the statistical analysis prove the existence of a significant connection between the observed debris flows and the RUSLE erosion index. Secondly, the complexity of the GWR model, based on non-fixed regression coefficients, points out that the same value of RUSLE index can have different effects in different locations. This result highlights the behavior of complex land systems in the real world, where different forces are acting (rain, running waters, internal water pressures, internal strains, gravity, etc.) and determine different point-to-point degrees of susceptibility to debris flows. On the basis of GWR parameters a scenario analysis regarding the area was conducted.

Download a 14-day Free Trial of SpaceStat

Follow Geoffrey Jacquez, PhD, on Twitter @GeoffJacquez.

Best wishes for your research efforts in 2015 and thank you for your interest in BioMedware.

The quantified self and crowd sourcing of the genome+, exposome and behavome: Perspective and call for action

1.28.14

by Geoffrey M. Jacquez1,2 and Robert Rommel2

1.  Department of Geography, State University of New York at Buffalo, Buffalo, NY

2. BioMedware, Ann Arbor MI

Introduction: Perhaps one of the greatest challenges and limitations in environmental health and epidemiology is that of measurement of individual health outcomes, their causes, and correlates.  Data on risk factors and exposures are often measured with imperfect instruments such as surveys that are inherently inaccurate and subject to recall and other bias.  At present, biomarkers may provide reasonable estimates of exposures, but can be difficult to obtain and are available for only a small number of compounds.  Ideally, epidemiologists and exposure assessment scientists would have timely information regarding measurements of individual-level physiology and ambient environment (e.g. at the human boundary layer), the specific times and locations where these measurements were collected, and in what settings.

We are at the beginning of a revolution in measurement that has the potential to transform environmental health and epidemiology (Swan 2013).  We argue this transformation will require new ways of thinking about data sharing, consent, privacy and confidentiality, and the formation of an organization to foster the use of personal, crowd-sourced data for the common good.  We begin with a working definition of the genome+, exposome and behavome, and examples of how technology is revolutionizing their measurement.  Next, we consider technology trends in the quantifiable self, and where these will lead in the near future.  We predict these trends will culminate in a new era for epidemiology and environmental health provided mechanisms are established to foster sharing of individual information from these new data streams.  We close with a call for action to form an organization charged with governance, data security, and mechanisms for data sharing, with the ultimate mission of advancing epidemiology, the environmental health sciences, and human health.

Genome+, exposome and behavome:  Consider a conceptual model of three important determinants of health (Figure 1).  Both illness and well-being are treated as core outcomes, whose expression is influenced by the genome+, exposome and behavome.  An individual’s “Genome +” is comprised of their genome (genetic composition), regulome (which controls gene expression), proteome (their compliment of amino acids and proteins) and metabalome (the basis of metabolism and homeostasis).   Together, these constitute a good portion of an individual’s biological makeup.  The exposome is defined as the totality of exposures over a person’s life course (Wild 2005). We define the behavome as the totality of an individual’s health-related behaviors over their life course.  Wild’s definition of the exposome included behavioral determinants of exposure; we treat the behavome separately to clarify the role of human behavior in mediating the exposome (through health behaviors such as smoking, exercise, diet and so on), as well as interactions between the exposome and the genome+ (for example, many behaviors are now recognized to have a genetic component, such as a predilection to alcohol and substance abuse).  These determinants of human health act through place, defined as the geographic, environmental, social and societal milieus experienced over a person’s life course.

Technology trends in measurement of the genome+, exposome and behavome: Measurement of the genome+, exposome and behavome is an enormous challenge.  Nonetheless, the last few years have seen major advances in measurement.  We now provide a few examples of such advances, before considering implications of these technology trends for environmental health and epidemiology.

Measurement of the genome+:  Continued improvements in sequencing technology are dramatically reducing the cost of sequencing individual genomes. In 2000 the Human Genome Project was completed, having sequenced the first whole human genome, at a cost in excess of USD$2 billion (Davies 2010). In 2012 the 1000 genomes project released their phase 1 sequencing data (Pybus et al. 2014).  This project is first to sequence the genomes of over 1,000 individuals, sampled to document human genetic variation across 25 populations from around the globe (The_1000_Genomes_Project_Consortium 2012). When it began, costs for fully sequencing an entire genome were high, and as a result only a portion of each genome was sequenced.  But the cost of whole genome sequencing continues to drop, and the USD$1,000 whole sequence genome is now available (Hayden 2014). In medical practice and research whole genome sequencing is posing ethical challenges regarding the amount of information to disclose to the individual, especially given incomplete knowledge of the genetic basis of disease (Yu et al. 2013).  Nonetheless, we expect whole genome sequencing for individuals to soon be a commodity available for USD$100 or less.  That sequence data will support viable business models is being proven out by companies such as 23andme (https:src=”//www.23andme.com/; see src=”//www.isogg.org/wiki/List_of_personal_genomics_companies for a list of personal genomics companies), which offer partial sequencing using saliva samples to explore ancestral origins and disease risks.  These dramatic reductions in cost of measurement are also occurring in the exome, epigenome, and other constituents of the genome+  (Zentner and Henikoff 2012, Weinhold 2012, Meissner 2012, Mefford 2012).  It is clear that measurements of the genome+ will soon be widely and inexpensively available, and will be incorporated into individual electronic health records, notwithstanding the informatics and ethical challenges posed by their integration (Kho et al. 2013, Flintoft 2014, Tarczy-Hornoch et al. 2013, Hazin et al. 2013).

Near real-time physiological measurement is expanding rapidly in clinical medicine as well as in the burgeoning “wearables” marketplace.  In 2012 Qualcomm established their Tricorder XPrize (src=”//www.qualcommtricorderxprize.org/), with the goal of creating a wireless handheld device that monitors and diagnoses a patient’s health conditions using personal health metrics.  The top three entries are to be announced in May, 2014. Wearable and implantable wireless sensors for healthcare monitoring include cancer detection, glucose monitoring, seizure warning, cardiac rate and rhythm, and heart attack detection, among others (Darwish and Hassanien 2011).  Google is now testing a glucose monitor incorporated into a contact lens (Landen 2014), and “smart” t-shirts for monitoring pulse, respiration and stress levels will soon be on the market (src=”//www.omsignal.com/).  Activity sensors such as those from Jawbone and fitbit are currently available, priced at around USD$100, and provide a record of daily activity (e.g. steps taken, distance traveled, time asleep).  Data from wearable sensors is already being used to monitor physical activity levels in pediatric patients (Yan et al. 2014).  At the 2014 Consumer Electronics Show wearables were prominent and recognized as an emerging market segment (Rowinski 2014).  While still small, the digital health market segment received $1.9B in venture funding in 2013, and posted 39% growth.  It has more than doubled since 2011 (RockHealth 2014).  Even so, the marketplace is nascent and fragmented, it is unclear what will succeed and what will not, and we have only an imperfect understanding of what people will adopt (Figure 2).  It seems reasonable, however, to assume that wearables as a technology will rapidly evolve and that their adoption and use will expand quickly.  This potentially poses a great opportunity for measurement in the environmental and health sciences.

Measurement of the exposome:   The challenge for quantification of an individual’s exposome is measurement of the ambient environment at the human boundary layer – the epidermis, mouth, mucosa, and nasal passages – where contaminants and pathogens enter the body (Balshaw and Kwok 2012).  This requires wearable sensors integrated into clothing (e.g. smart shirts, pants and shoes), jewelry and bracelets, or wearable on the lapel ( see (Windmiller and Wang 2013) for a review).   Recognizing their importance for quantification of the exposome, the National Institutes of Health has funded several initiatives to develop such “environmental tricorders” (see for example src=”//grants.nih.gov/grants/guide/rfa-files/RFA-ES-09-005.html). Environmental tricorders are already on the market, although the environmental factors they monitor are somewhat limited, including Volatile Organic Compounds, dust, light, sound, ionizing radiation, carbon dioxide and others.  Examples include products from Valarm and Sensorcon, among others.   The Knight foundation recently funded prototyping of the “Global Sensor Web”, whose objective is to create an online platform for aggregating geo-tagged data sets from public and personal data sources (src=”//www.knightfoundation.org/grants/201347663/), although these are not necessarily data from wearable sensors.  As noted below, there currently is a distance between sensors of sufficient quality, accuracy and precision to be of immediate use in environmental epidemiology, and the low-cost sensors currently being adopted in the consumer marketplace.

Measurement of the behavome:  We think of the behavome as separate from Wild’s exposome, as health behaviors are key mediators of exposures.  Behavioral recognition methods for assessing what an individual is doing has for decades been an important topic of health research.  With the advent of sensors in residences, health care facilities, and wearable on patients, the issue of multisensor data fusion for activity recognition has become an important topic.   These technologies are already being deployed and assessed in nursing home and assisted living facilities. Recent research has demonstrated these methods can identify risky behaviors with good accuracy and low deployment costs (Palumbo et al. 2013).  The “internet of things” including smart homes, smart cars and smart workplaces, is in the early phase of what many predict to be explosive growth (Ashton 2009).  In 2008 the number of devices on the Internet exceeded the number of people, and in 2020 will exceed 50 billion devices (Swan 2012). Information on when, where and how we use appliances, electronic devices, machinery and environmental  controls in home and workplace settings, and while commuting, have yet to be used to quantify the behavome.  The value of near real-time data on ambient temperatures and how often and when we use the refrigerator may have enormous value for quantifying, for example, personal energy budgets.  A variety of different approaches for assessing health behaviors have been suggested using technologies such as inertial sensors, Global Positioning System, smart homes, Radio Frequency IDentification and others. Most promising is the sensor fusion approach that combines data from several sensors simultaneously (Lowe and ÓLaighin 2014).  To our knowledge technologies such as Google Glass have yet to be used for capturing video images to chronicle dietary intake and other health-related activities.  Other potential applications include quantification of personal energy budgets, individual walkability (e.g. (Mayne et al. 2013)), and documentation of other personalized environmental metrics.  Once health-related behaviors are known, the possibility of using gamification (Whitson 2013) and other approaches to encourage salubrious behaviors become possible  (Schoech et al. 2013).

Where will these measurement trends lead?  At present there are two domains for measuring the quantified self, the high-end approach focused on measurement accuracy and precision, and the quantified self as a commodity that is focused on capturing the consumer market (refer to Figure 3).  We see the possibility of a future convergence and emergence of low-cost sensors of sufficient quality to support a common good – high quality environmental health research, made possible by volunteered information provided by enlightened citizen scientists.  But achievement of this goal likely will require the establishment of appropriate mechanisms of data sharing, oversight and governance, and the creation of a user community of sufficient market mass to influence the development of COTS sensors of sufficient quality to support research in environmental health and epidemiology.

What benefits might be realized?  There is a growing recognition that new ways of measuring ourselves require new ways of understanding what “normal” means (McFedries 2013).  We know amazingly little about the ambient environments individuals experience through the course of their daily lives.  Similarly, we know very little about the local environments experienced by the members of local communities across the US and around the world.  How much temporal and spatial variability is there in air quality at the human boundary layer for individuals in diverse communities?  What are the exposure profiles of children as they move about their daily lives in our neighborhoods and schools?  A national baseline environmental assessment, incorporating data on the quantified self collected by citizen scientists, may begin to address such questions.  Important issues will need to be addressed regarding data quality, data sharing, sampling design and governance, but these do not appear to be insurmountable  (Goldberg et al. 2013).

A call for action.  It seems reasonable to the authors to assume the technology trends in figure 3 will indeed result in more accurate measurement of individual-level physiology, exposures and genetics at decreased costs.  In fact, the commoditization of the quantifiable self is rapidly taking place, as demonstrated by products such as Google Glass, fitbit, the advent of environmental tricorders from sensorcon, Valarm and others.  The data collected by these technologies is a highly valued business asset, and as such is not likely to be shared with research scientists for advancing epidemiology and public health. This balkanization of data for measuring the quantifiable self may be ameliorated by appealing directly to the individual to act both in their own self interest and also for the common good.  We must provide an alternative for the highly vested, motivated citizen scientist, so they may choose to share their personal measurements for the good of all.  This will require the formation of an organization charged with governance, data sharing, data security, and oversight of research and data use, with the overall mission of advancing human health.  Rather than a balkanization of measurements on the quantifiable self into isolated information silos under the control of corporations, we envision the enlightened sharing of such personal measurements that will lead to safer neighborhoods, workplaces and communities.  We believe our professional organizations, the AAAS, Sigma Xi, ISEE, and the AAG, are best positioned to take a leadership role in addressing this need.  We encourage readers to consider bringing this to the attention of their professional organizations.

Figure 1. Schematic of relationships between the genome+, behavome, exposome and human health.

 

Figure 2. Schematic of a survey by Forrester Research, Inc., assessing adoption preferences for wearable devices.
Figure 2. Schematic of a survey by Forrester Research, Inc., assessing adoption preferences for wearable devices.
Figure 3. Technology trends in the quantified self.
Figure 3. Technology trends in the quantified self.

Research sensors, such as those funded by grants from NIH and Qualcomm’s Xprize challenge, strive for accuracy and high quality, but are expensive. The emergence of the wearables marketplace is resulting in sensors as commodities, COTS (commercial off the shelf) sensors that are relatively inexpensive but not necessarily of sufficient quality (e.g. accuracy and precision of measurement) to support environmental health research. We see a trend towards higher quality, low cost sensors (Future bubble) that may be suited for baseline environmental and population-level exposure assessment.
References
Ashton, K. 2009. That ‘Internet of Things’ Thing. In RFID Journal.
Balshaw, D. M. & R. K. Kwok (2012) Innovative Methods for Improving Measures of the Personal Environment. American Journal of Preventive Medicine, 42, 558–559.
Darwish, A. & A. E. Hassanien (2011) Wearable and Implantable Wireless Sensor Network Solutions for Healthcare Monitoring. Sensors, 11, 5561-5595.
Davies, K. 2010. The $1,000 Genome: The Revolution in DNA Sequencing and the New Era of Personalized Medicine. New York: Simon and Schuster, Inc.
Flintoft, L. (2014) Phenome-wide association studies go large. Nature Reviews Genetics, 15.
Goldberg, D. W., G. M. Jacquez, W. Kuhn, M. G. Cockburn, D. Janies, E. Pultar, T. A. Hammond, C. Knoblock & M. Raubal. 2013. Envisioning a Future for a Spatial-Health CyberGIS Marketplace. In Second International ACM SIGSPATIAL Workshop on HealthGIS (HealthGIS’13). Orlando, FL, USA: ACM SIGSPATIAL.
Hayden, E. C. (2014) Is the $1,000 genome for real? Nature News.
Hazin, R., K. B. Brothers, B. A. Malin, B. A. Koenig, S. C. Sanderson, M. A. Rothstein, M. S. Williams, E. W. Clayton & I. J. Kullo (2013) Ethical, legal, and social implications of incorporating genomic information into electronic health records. Genetics in Medicine, 15, 810–816.
Kho, A. N., L. V. Rasmussen, J. J. Connolly, P. L. Peissig, J. Starren, H. Hakonarson & M. G. Hayes (2013) Practical challenges in integrating genomic data into the electronic health record. Genetics in Medicine, 15, 772–778.
Landen, R. (2014) Google lens for monitoring glucose has hurdles to clear before hitting market. Modern Healthcare. src=”//www.modernhealthcare.com/article/20140117/NEWS/301179919 (last accessed.
Lowe, S. A. & G. ÓLaighin (2014) Monitoring human health behaviour in one’s living environment: A technological review. Medical engineering & physics.
Mayne, D., G. Morgan, A. Willmore, N. Rose, B. Jalaludin, H. Bambrick & A. Bauman (2013) An objective index of walkability for research and planning in the Sydney Metropolitan Region of New South Wales, Australia: an ecological study. International Journal of Health Geographics, 12, 61.
McFedries, P. (2013) Tracking the quantified self [Technically speaking]. Spectrum, IEEE, 50, 24-24.
Mefford, H. C. (2012) Diagnostic Exome Sequencing — Are We There Yet? New England Journal of Medicine, 367, 1951-1953.
Meissner, A. (2012) What can epigenomics do for you? Genome Biology, 13, 420.
Palumbo, F., P. Barsocchi, C. Gallicchio, S. Chessa & A. Micheli. 2013. Multisensor Data Fusion for Activity Recognition Based on Reservoir Computing. In Evaluating AAL Systems Through Competitive Benchmarking, eds. J. Botía, J. Álvarez-García, K. Fujinami, P. Barsocchi & T. Riedel, 24-35. Springer Berlin Heidelberg.
Pybus, M., G. M. Dall’Olio, P. Luisi, M. Uzkudun, A. Carreño-Torres, P. Pavlidis, H. Laayouni, J. Bertranpetit & J. Engelken (2014) 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans. Nucleic Acids Research, 42, D903-D909.
RockHealth. 2014. Digital Health Funding: A Year in Review 2013.
Rowinski, D. (2014) CES 2014: Connected Home And Wearables To Take Center Stage. readwrite. src=”//readwrite.com/2014/01/03/ces-2014-preview-wearable-technology-4k-tv-connected-home-smartphones-tablets#awesm=~otSG2reDbLK7W5 (last accessed.
Schoech, D., J. F. Boyas, B. M. Black & N. Elias-Lambert (2013) Gamification for Behavior Change: Lessons from Developing a Social, Multiuser, Web-Tablet Based Prevention Game for Youths. Journal of Technology in Human Services, 31, 197-217.
Swan, M. (2012) Sensor Mania! The internet of things, wearable computing, objective metrics, and the quantified self 2.0. Journal of Sensor and Actuator Networks, 1, 217-253.
Swan, M. (2013) The Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery. Big Data, 1, 85-99.
Tarczy-Hornoch, P., L. Amendola, S. J. Aronson, L. Garraway, S. Gray, R. W. Grundmeier, L. A. Hindorff, G. Jarvik, D. Karavite, M. Lebo, S. E. Plon, E. V. Allen, K. E. Weck, P. S. White & Y. Yang (2013) A survey of informatics approaches to whole-exome and whole-genome clinical reporting in the electronic health record. Genetics in Medicine, 15, 824–832.
The_1000_Genomes_Project_Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65.
Weinhold, B. (2012) More Chemicals Show Epigenetic Effects across Generations. Environ Health Perspect, 120.
Whitson, J. R. (2013) Gaming the Quantified Self. Surveillance & Society, 11, 163-176.
Wild, C. (2005) Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer epidemiology, biomarkers & prevention, 14, 1847–50.
Windmiller, J. R. & J. Wang (2013) Wearable Electrochemical Sensors and Biosensors: A Review. Electroanalysis, 25, 29-46.
Yan, K., B. Tracie, Marie-, #xc8, M. ve, #xe9, H. lanie, B. Jean-Luc, T. Benoit, M. St-Onge & L. Marie (2014) Innovation through Wearable Sensors to Collect Real-Life Data among Pediatric Patients with Cardiometabolic Risk Factors. International Journal of Pediatrics, 2014, 9.
Yu, J.-H., S. M. Jamal, H. K. Tabor & M. J. Bamshad (2013) Self-guided management of exome and whole-genome sequencing results: changing the results return model. Genetics in Medicine, 15, 684–690.
Zentner, G. & S. Henikoff (2012) Surveying the epigenomic landscape, one base at a time. Genome Biology, 13, 250.

The small numbers problem–Part 2

3.24.11

Using persistence in spatial time series as a diagnostic for extreme rates in small areas.

In my last blog on the small numbers problem (src=”//www.biomedware.com/blog/2010/small_numbers_problem1/) we found that rates calculated with small denominators (e.g. small at-risk populations) have high variance and we thus have little confidence in the rate estimate.  I presented a simple visual diagnostic process in SpaceStat for determining whether one should be concerned about the small numbers problem.  Now let’s suppose you observe a high rate in an area with a small population.  What might cause the high rate?  We now know the high rate might be attributable to the high variance because the area has a small population.  But we cannot preclude the possibility that the underlying risk in that area is actually high.  This raises an important question.  How might we distinguish between high rate estimates due to small numbers and high rate estimates attributable to high underlying risk?  

To make this more concrete consider a simple example.  Suppose a superfund site has been emitting known carcinogens for childhood leukemia into the ground water, and that a small community adjacent to the site relies on groundwater as its drinking water source.   The rate for childhood leukemia in that town is quite high – 2-3 times the state average – but the town is small, and because of this the variance in the town childhood leukemia rate is high and is not statistically different from the state average.  

There are several additional issues to consider here, including exposure routes and mechanisms, whether biomarkers of exposure exist and are measureable, whether the suspected exposure is biologically plausible, and whether the suspected compound (if there is one) is actually present in household water supplies.  But as a first step we often are asked to address this question:  Is there actually an excess of disease in the town?  And this brings us back to the small numbers problem – how can we detect true, elevated risk when the population impacted is small? 

Rate stabilization approaches (e.g. empirical Bayes) at first blush appear to be one possibility, but these involve something that has been called “shrinkage towards the mean”.  Here one assigns a weight to an area’s observed rate based on the size of the population at risk; when the population is small the weight is small, when the population is large the corresponding weight is large.  One then stabilizes the rate by “borrowing” the rate estimate from surrounding areas (a local smoothing) or from a larger reporting area (e.g. the state average, a global smoothing).   Going back to our small town, one thus would give less weight to the observed leukemia rate in that town  (because the population is small), and derive a new rate estimate that is comprised primarily of the state average (shrinkage towards the mean).  But wait a moment.  What if the true rate in the small town is actually high?  This smoothing procedure would actually obscure that signal, and we might come to the incorrect conclusion that childhood leukemia isn’t elevated.  What can be done? 

This brings us to the treatment of spatial time series as experimental systems.  The underlying concept is very simple.   When we observe a disease rate in a small area through time we can use repeated observations – the time series in that small area – to garner additional information on whether the underlying rate is elevated or not.  When the rate is truly elevated we will expect the observed rate through time to be high and remain, even though the variance about that rate may be large due to the small numbers problem.  This illustrates the concept of persistence in spatial time series, and how we can use persistence to garner additional information regarding whether or not an observed rate is truly unusual.  

Notice information on persistence in small areas can be lost when the data are first smoothed since repeated, elevated rates in areas with small populations will each “shrink to the mean”.  I therefore prefer to inspect the spatial time series initially using the raw data themselves, and then make decisions about rate stabilization later.  

To try this out download the SpaceStat project for this blog (“SmallNumbersBlog2”).  You can download and install a trial version of SpaceStat here (src=”//www.biomedware.com/?module=Page&sID=spacestat).  

 

 

National Cancer Institute lung cancer mortality in white males in State Economic Areas (SEA’s) from 1970 to 1995

 

When you load the project you should see something like this (above). The data come from the National Cancer Institute and we’ll be working with lung cancer mortality in white males in State Economic Areas (SEA’s) from 1970 to 1995.  The rates are age-adjusted and reported per 100,000 population.  The data are recorded in 5 year periods.  I like to use the following protocol to inspect spatial time series for persistence that may indicate the existence of an underlying rate that is truly unusual.

  1.  Animate the maps and scatterplot by clicking on the play button on the animation toolbar .  This gives you some sense of whether the distribution of rates as a group is changing through time.  You notice on the scatterplot that the points tend to move to the right, reflecting population growth from 1970 through 1995.   You also might notice the map turns redder as time progresses, suggesting the lung cancer mortality in white males is increasing. 
  2. These are a spatial time series since the data are recorded in five year periods, and we can inspect the time series using the time plot tool .  I chose the variable “WLUNG”.

WLUNG

Inspection of the time plot confirms our observation that lung cancer mortality overall is increasing through time.  Each line on the time plot is the lung cancer mortality rate in a given SEA.  You can click the lines in the time plot to identify the SEA on the map to which they correspond. 

3.  Do you think the two SEA’s with very low rates in 1970 had low rates that persisted through time, or did those rates bounce around a lot perhaps due to the small numbers problem?  To answer this question I brush select the two lowest rates on the scatterplot at 1970 (below).

 White Population vs WLUNG

Not only do those two SEA’s have the lowest rates in 1970, but those rates are persistently low over the next 25 years, all the way to 1995.  

4.  What might explain this persistence of low rates?  To address this question I inspect the map, and right click on the selected areas to see where they are and who might live there (below).   The SEA’s under consideration are Logan and Provo, Utah, which have large Mormon populations that do not smoke.  Based on this, do you think the low mortality rates in these two SEA’s might most reasonably be attributed to behavioral factors or to the small numbers problem?

This blog I hope has illustrated how we can treat spatial time series as repeated experiments to evaluate whether extreme rates (whether high or low) persist through time in local populations.  The health analyst now has at their disposal highly sophisticated tools for rate stabilization that typically involve local smoothing of rates and/or “shrinkage to the mean”.  This includes popular techniques such as kernel smoothing and Bayesian rate stabilization.  Be aware, however, that these methods can “smooth out” truly high (or low) rates.  Evaluating persistence in spatial time series avoids this problem.

The small numbers problem: What you see is not necessarily what you get.

11.4.10

The ability to quickly create maps of health outcomes such as cancer incidence and mortality in counties, census areas and even Zip codes is now available through websites and data portals. (See for example Atlasplus, State Cancer Profiles, and Cardiovascular Disease, and many others.)

Often these maps appear with measures of the “uncertainty” in the underlying rate estimates, which, for example, might be the colon cancer incidence in Washtenaw County, Michigan.  Sometimes all we see are the maps themselves, and the eye is immediately and quite naturally drawn to areas with extremely high rates.   If you live there you are concerned, if you are a public health professional you might decide to evaluate the high rate, or put in place a health intervention to expand screening and access to care in that local area.  But is the rate (e.g. colon cancer incidence in a given county) truly high, or might it be explained by the “small numbers problem”?  What is the small numbers  problem, and why should you care about it?

Rates are calculated from a numerator, such as the number of incident colon cancer cases; and a denominator, which here would be the population at risk (e.g. adults) for colon cancer.  The rate is calculated by dividing the numerator by the denominator, and here is where the small numbers problem arises.  It turns out that the variance in the rate depends critically on the size of the denominator.  When the denominator is small, variance in the rate is high, when the denominator is large, variance in the rate is small.  Hence the appearance of an apparently large rate might be due entirely or in part to the small numbers problem, and the true, underlying risk might be entirely unremarkable.

So how can we quickly evaluate whether we need to worry about the small numbers problem?  I like to use a simple protocol in SpaceStat™.

Step 1. Create a map of the rate.

Step 2. Create a scatterplot of the rate (on the x-axis) and the population at risk (on the y-axis).

Step 3. Inspect the scatterplot for the “Greater Than” signature  (e.g. “>”) such that variance in the rate is larger at small population sizes.

Step 4.  Brush select on the scatterplot to see where the areas with high rates and low population sizes appear on the map.  These are the places with apparent high rates that are unstable due to the small numbers problem.  You don’t have much confidence that these rates really are high!

Let’s see how this works.  As an example, I used SpaceStat™ to explore the small numbers problem for lung cancer mortality in white males in state economic areas (SEA’s) in the 1970’s.  I transformed the white male population size using the square root since the population distribution has a long right tail, as is typical of study areas that include rural and urban places.  I then plotted lung cancer mortality as a function of the square root of population size (Figure 1, left).  Do you see the “greater than” signature (e.g. “>”) that indicates variance in lung cancer mortality is much higher at smaller population sizes?  I then brush selected on the scatterplot to identify areas with high lung cancer mortality rate and low population size, these are now circumscribed with heavy gold borders on the map.  Do you have much confidence that the areas outlined in gold have a true excess of lung cancer mortality, or do you think it might be explained by the small numbers problem?  Clearly, the variance in the rate is high in these areas, and the high rate we observe might be due entirely to small numbers.

Figure 1 Lung Cancer
Figure 1 Lung Cancer

A future post will describe what to do in SpaceStat™ to take the small numbers problem into account in order to accurately find areas of high risk.

Monthly Archives

Featured

SpaceStat is a comprehensive software package that enables visualization and analysis of space–time data, overcoming some of the constraints inherent to spatial-only Geographic Information System software.

Read More About our Software »

Methods

Time is an integral part of our analysis. All views of the data can be animated, from maps to histograms to scatter plots. Furthermore, these animations can be linked together to explore ideas and discover patterns.

Learn More About Our Methods »

Order Books

Compartmental Analysis in Biology and Medicine, 3rd Edition, by John A. Jacquez and Modeling with Compartments, by John A. Jacquez

Order Now »

  • 01Research
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact

© 2018 BioMedware | P.O. Box 1577, Ann Arbor, MI 48106 | Phone & Fax: (734) 913-1098
Privacy Policy