Menu
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact
  • 01Research
    • Research Interests
    • Current Projects
      • Current Projects – METRIC Software
      • Current Projects – Non-destructive Detection of Pipeline Properties
      • Current Projects – Cryptography for confidentiality protection
      • Current Projects – Geostatistical Boundary Analysis
      • Current Projects – I-HEAT
      • Current Projects – Geostatistical software for space-time interpolation
    • Completed Projects
      • Completed Projects – Boundary Analysis
      • Completed Projects – Modeling of College Drinking
      • Completed Projects – Cancer Cluster Morphology
      • Completed Projects – Geostatistical software for the analysis of individual-level epidemiologic data
      • Completed Projects – Case-only Cancer Clustering for Mobile Populations
      • Completed Projects – Contextual Mapping of Space-Time Cancer Data
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
    • SpaceStat
    • SpaceStat Help & Tutorials
    • Overview
    • ClusterSeer
      • ClusterSeer Help and Tutorials
      • Overview
    • BoundarySeer
      • BoundarySeer Help and Tutorials
      • Overview
    • Support
    • Network License
    • Purchase
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact
  • 01Research
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
Subnav Menu

SpaceStat

Using SpaceStat to identify Blue Zones (areas with a high proportion of the very old)

6.4.17

Recently my colleague Huang Yi, PhD, Nantong University, and I published a paper that used SpaceStat to find Blue Zones in China.

The data came from the Chinese and local governments.  After loading and validating the data we used spatial time series methods to identify statistically significant Blue Zones.  We explored several indices that have been proposed for blue zone identification and report the following results (refer to abstract, below).  You can use the methods described in this paper to identify Blue Zones, and how their spatial distribution changes through time.

Abstract:

Influenced by a special local environment, the proportion of centenarians is particularly high in some places, known as “blue zones”. Blue zones are mysterious regions that continue to attract research. This paper explores the spatial distribution of the longevity population in a typical Chinese longevity region. Longevity evaluation indexes are used to analyze the longevity phenomenon in 88 towns between 2011 and 2015. Our research findings show that longevity is more important than birth rate and migration in shaping the degree of deep aging in the research region.

Fluctuations in the proportion of centenarians are much higher than for nonagenarians, both in relation to towns and to years. This is because there are so few centenarians that data collected over a short time period cannot accurately represent the overall degree of longevity in a small region; data and statistics must be collected over a longer time period to achieve this. GIS analysis revealed a stable longevity zone located in the center of the research region. This area seems to help people live more easily to 90–99 years old; however, its ability to help nonagenarians live to 100 is a weaker effect.

Identification of a Blue Zone in a Typical Chinese Longevity Region. Available from: https://www.researchgate.net/publication/317179284_Identification_of_a_Blue_Zone_in_a_Typical_Chinese_Longevity_Region [accessed May 28, 2017].

Workshops! GIS for Community Impact: From Technology to Translation – Oakland, CA

3.13.15

Save the dates April 13 and 14: Oakland, California

April 13:  Pre-Workshop Short-course: Space Time Analysis for Health and the Environment

This 1-day class will provide instruction on the space-time analysis of data relating health events to potential sources of environmental exposures. Examples of data will include the time and place of residence of individuals with a given diagnosis, space-time paths of cases and controls over the life course, and cancer rates in local areas through time. We will focus on chronic diseases, such as cancer.

Instructor:  Geoffrey Jacquez, PhD

Suitable for:  Students, environmental and health faculty, researchers and community activists (a basic understanding of data and statistics would be useful).

April 14: Workshop on how to use geographic data to look at breast and other cancer risk factors in our physical and social environments and to inform prevention efforts.

(Supported by the California Breast Cancer Research Program award 21MB-0001 with space donated by The California Endowment.)

For additional information and registration details: www.zerobreastcancer.org/events

GIS Workshop Flyer

Announcing the Release of SpaceStat 4: software for the visualization, analysis, modeling and interactive exploration of spatiotemporal data

5.3.14
Download a Free 14-day evaluation of SpaceStat

SpaceStat 4.0 represents a major reworking of the underlying architecture of the application. Multithreading  has been introduced improving the performance of many methods. A LePace-Sage estimator for spatial-error and spatial-lag analyses has been added to the spatial regression method.

Based on customer feedback, we have designed feature enhancements in SpaceStat 4.0 that improve the appearance, functionality and performance of maps and graphs. You will also find that the extensive help documentation has been updated, revised and expanded.

Additionally, we’ve responded to your requests for a SpaceStat virtual class by adding a series of tutorials to our website. Each tutorial comes with a SpaceStat project designed to get you started working with a specific concept, and provides a landing page with a description, time estimate and associated project links.

New LeSage-Pace Estimator

A LeSage-Pace estimator for spatial-error and spatial-lag analyses has been added to the spatial regression method.

LePaceSage2“I am involved in developing and applying multiple regression models for the mass valuation of residential real estate properties. Modelers such as me are always seeking to find improved model accuracy. The spatial regression models in SpaceStat are of particular interest. The addition of the LeSage-Pace output makes it easier to compare to other methods. Incidentally the Spatial Error Model has been the best performer among all models I have tested lately. It is featured in a book I have written on spatio-temporal methods in mass appraisal to be published June 2014. Also thanks to BioMedware for making this change to the product.”

Richard A. Borst, PhD
Tyler Technologies, Inc.

Recently Published Research using SpaceStat…

Spatial Relationship Quantification between Environmental, Socioeconomic and Health Data at Different Geographic Levels

Int. J. Environ. Res. Public Health 2014, 11(4), 3765-3786; doi:10.3390/ijerph110403765
Authors: Mahdi-Salim Saib, Julien Caudeville, Florence Carre, Olivier Ganry, Alain Trugeon  and Andre Cicolella

“We used Spacestat to evaluate relationships between spatial data collected at different geographic scales. Spacestat is easy-to-use and provides powerful tools that make possible spatial data processing, exploratory analysis, and the quantification of spatial relationships in environmental health research. Spacestat is extraordinarily useful for stakeholders seeking to prioritize prevention actions in the context of environmental inequalities reduction.”

Julien Caudeville
French National Institute for Industrial Environment and Risks (INERIS)
Parc Technologique Alata, BP 2, 60550 Verneuil-en-Halatte, France

Space-time clusters of breast cancer using residential histories: A Danish case control study

BMC Cancer.2014, 14:255.  DOI: 10.1186/1471-2407-14-255
Authors: Nordsborg Baastrup Rikke, Meliker R Jaymie, Ersbøll Kjær Annette, Jacquez M Geoffrey, Poulsen Harbo Aslak, Raaschou-Nielsen  Ole

Background

A large proportion of breast cancer cases are thought related to environmental factors. Identification of specific geographical areas with high risk (clusters) may give clues to potential environmental risk factors. The aim of this study was to investigate whether clusters of breast cancer existed in space and time in Denmark, using 33 years of residential histories.

Methods

We conducted a population-based case–control study of 3138 female cases from the Danish Cancer Registry, diagnosed with breast cancer in 2003 and two independent control groups of 3138 women each, randomly selected from the Civil Registration System. Residential addresses of cases and controls from 1971 to 2003 were collected from the Civil Registration System and geo-coded. Q-statistics were used to identify space-time clusters of breast cancer. All analyses were carried out with both control groups, and for 66% of the study population we also conducted analyses adjusted for individual reproductive factors and area-level socioeconomic indicators.

Results

In the crude analyses a cluster in the northern suburbs of Copenhagen was consistently found throughout the study period (1971–2003) with both control groups. When analyses were adjusted for individual reproductive factors and area-level socioeconomic indicators, the cluster area became smaller and less evident.

Conclusions

The breast cancer cluster area that persisted after adjustment might be explained by factors that were not accounted for such as alcohol consumption and use of hormone replacement therapy. However, we cannot exclude environmental pollutants as a contributing cause, but no pollutants specific to this area seem obvious.

Download a Free 14-day evaluation of SpaceStat

Announcing Our Membership in Esri’s Business Partner Network and the Release of SpaceStat 3 with Geodatabase File Format Integration

9.5.11

I am very excited by our release of SpaceStat 3, which links BioMedware to Esri technology using the geodatabase file format and is an important step towards addressing unmet needs in data access and geohealth analysis techniques.  This is very timely, as BioMedware recently joined Esri’s Business Partner Network.

With this latest release, SpaceStat readily imports the wealth of geospatial health data provided by Esri and Esri’s business partners. SpaceStat’s advanced visualization, space-time analysis, and modeling techniques are easily integrated into workflows that use Esri technologies.  For example, you can use ArcGIS to acquire, edit, and manipulate your data, and then use SpaceStat to analyze time-dynamic data; to target health interventions, to assess health disparities, and to undertake predictive modeling. (See my last blog for a list of 20 health analysis activities commonly accomplished in SpaceStat.)

Importing geodatabase (gdb) files is straightforward.  After starting SpaceStat, select “File->Import->Esri geodatabase” and you will be taken to the import dialog.  In the shown example we are importing CentralBusinessDistrect.gdb, using the CBDOutline as the feature class.

importing geodatabase files
Importing Geodatabase Files

A new geography will be created in SpaceStat, called CentralBusinessDistrict (CBDOutline).  The data may be time-stamped, which is a useful mechanism for representing time-dynamic geographies, such as business districts, where buildings are demolished, new ones built, and streets are rerouted. 

CentralBusinessDistrict geography
CentralBusinessDistrict Geography

Here we are looking at the Pittsburgh business district, and have displayed the different neighborhoods that comprise it. The circular feature is Mellon Arena, home of the NHL’s Pittsburgh Penguins. The Mellon Arena features the largest retractable, stainless steel dome roof in the world–170,000 total square feet and 2,950 tons of Pittsburgh steel. It certainly stands out on the map!

Interested in giving SpaceStat 3 a spin?  Go here for a free 14-day evaluation copy.

Esri, and esri.com are trademarks, registered trademarks, or service marks of Esri in the United States, the European Community, or certain other jurisdictions.

BioMedware Joins Esri Partner Network

9.1.11

Partnership Provides an Easy Way for Esri Users to Add SpaceStat’s Advanced Space-Time Analysis Into Their Workflows

Ann Arbor, MI – Sept. 6, 2011 – BioMedware, a leader in geohealth software development and research solutions, today announced its membership in the Esri Partner Network.

BioMedware’s President, Dr. Geoffrey Jacquez, also revealed that BioMedware’s flagship product, SpaceStat (formerly known as STIS), now links to Esri technology using the geodatabase file format. “Formalizing our partnership with Esri is an important step towards addressing unmet needs in data access and geohealth analysis techniques.” said Jacquez.  “By including geodatabase support in SpaceStat v3.0, Esri customers will easily be able to integrate SpaceStat’s advanced space-time analysis methods and visualization techniques into their workflows.” Jacquez added, “We’re looking forward to growing this new relationship and leveraging our Esri partner status in ways that will benefit both our customers and Esri’s clients, and improve data analysis.”

Dr. Jacquez, an expert in geohealth research and spatial-time analysis software development, is presenting a talk tomorrow entitled “Does geocoding positional error matter in health GIS studies?” at the Esri Health GIS Conference being held in Washington, D.C.

BioMedware counts over 600 academic and government organizations among their global customer base.

Esri customers can visit www.biomedware.com/esri and sign up for a free 14-day evaluation release of SpaceStat v3.0, with geodatabase file format support.

About Esri
Since 1969, Esri has been giving customers around the world the power to think and plan geographically. The market leader in geographic information system (GIS) technology, Esri software is used in more than 300,000 organizations worldwide including each of the 200 largest cities in the United States, most national governments, more than two-thirds of Fortune 500 companies, and more than 7,000 colleges and universities. Esri applications, running on more than one million desktops and thousands of Web and enterprise servers, provide the backbone for the world’s mapping and spatial analysis. Esri is the only vendor that provides complete technical solutions for desktop, mobile, server, and Internet platforms. Visit us at esri.com/news.

Esri, the Esri globe logo, GIS by Esri, ArcLogistics, esri.com, and @esri.com are trademarks, registered trademarks, or service marks of Esri in the United States, the European Community, or certain other jurisdictions. Other companies and products mentioned herein may be trademarks or registered trademarks of their respective trademark owners.

About BioMedware
BioMedware was founded in 1995 to develop statistical analysis methods and to make the methods more available to non-statisticians in user-friendly software. To date, BioMedware has received nearly ten million dollars in grants to develop new methods and software for the spatial analysis and modeling of disease. This research has resulted in three commercial software products: SpaceStat, ClusterSeer and BoundarySeer. For more information, visit www.biomedware.com or email [email protected]

SpaceStat, ClusterSeer and BoundarySeer are trademarks of BioMedware, Inc. All other trademarks are the property of their respective holders.

Copyright© 2011 BioMedware. All Rights Reserved.

For media inquiries or product information contact Susan Hinton
+1 734-913-1098 ext 201
[email protected]

The small numbers problem–Part 2

3.24.11

Using persistence in spatial time series as a diagnostic for extreme rates in small areas.

In my last blog on the small numbers problem (src=”//www.biomedware.com/blog/2010/small_numbers_problem1/) we found that rates calculated with small denominators (e.g. small at-risk populations) have high variance and we thus have little confidence in the rate estimate.  I presented a simple visual diagnostic process in SpaceStat for determining whether one should be concerned about the small numbers problem.  Now let’s suppose you observe a high rate in an area with a small population.  What might cause the high rate?  We now know the high rate might be attributable to the high variance because the area has a small population.  But we cannot preclude the possibility that the underlying risk in that area is actually high.  This raises an important question.  How might we distinguish between high rate estimates due to small numbers and high rate estimates attributable to high underlying risk?  

To make this more concrete consider a simple example.  Suppose a superfund site has been emitting known carcinogens for childhood leukemia into the ground water, and that a small community adjacent to the site relies on groundwater as its drinking water source.   The rate for childhood leukemia in that town is quite high – 2-3 times the state average – but the town is small, and because of this the variance in the town childhood leukemia rate is high and is not statistically different from the state average.  

There are several additional issues to consider here, including exposure routes and mechanisms, whether biomarkers of exposure exist and are measureable, whether the suspected exposure is biologically plausible, and whether the suspected compound (if there is one) is actually present in household water supplies.  But as a first step we often are asked to address this question:  Is there actually an excess of disease in the town?  And this brings us back to the small numbers problem – how can we detect true, elevated risk when the population impacted is small? 

Rate stabilization approaches (e.g. empirical Bayes) at first blush appear to be one possibility, but these involve something that has been called “shrinkage towards the mean”.  Here one assigns a weight to an area’s observed rate based on the size of the population at risk; when the population is small the weight is small, when the population is large the corresponding weight is large.  One then stabilizes the rate by “borrowing” the rate estimate from surrounding areas (a local smoothing) or from a larger reporting area (e.g. the state average, a global smoothing).   Going back to our small town, one thus would give less weight to the observed leukemia rate in that town  (because the population is small), and derive a new rate estimate that is comprised primarily of the state average (shrinkage towards the mean).  But wait a moment.  What if the true rate in the small town is actually high?  This smoothing procedure would actually obscure that signal, and we might come to the incorrect conclusion that childhood leukemia isn’t elevated.  What can be done? 

This brings us to the treatment of spatial time series as experimental systems.  The underlying concept is very simple.   When we observe a disease rate in a small area through time we can use repeated observations – the time series in that small area – to garner additional information on whether the underlying rate is elevated or not.  When the rate is truly elevated we will expect the observed rate through time to be high and remain, even though the variance about that rate may be large due to the small numbers problem.  This illustrates the concept of persistence in spatial time series, and how we can use persistence to garner additional information regarding whether or not an observed rate is truly unusual.  

Notice information on persistence in small areas can be lost when the data are first smoothed since repeated, elevated rates in areas with small populations will each “shrink to the mean”.  I therefore prefer to inspect the spatial time series initially using the raw data themselves, and then make decisions about rate stabilization later.  

To try this out download the SpaceStat project for this blog (“SmallNumbersBlog2”).  You can download and install a trial version of SpaceStat here (src=”//www.biomedware.com/?module=Page&sID=spacestat).  

 

 

National Cancer Institute lung cancer mortality in white males in State Economic Areas (SEA’s) from 1970 to 1995

 

When you load the project you should see something like this (above). The data come from the National Cancer Institute and we’ll be working with lung cancer mortality in white males in State Economic Areas (SEA’s) from 1970 to 1995.  The rates are age-adjusted and reported per 100,000 population.  The data are recorded in 5 year periods.  I like to use the following protocol to inspect spatial time series for persistence that may indicate the existence of an underlying rate that is truly unusual.

  1.  Animate the maps and scatterplot by clicking on the play button on the animation toolbar .  This gives you some sense of whether the distribution of rates as a group is changing through time.  You notice on the scatterplot that the points tend to move to the right, reflecting population growth from 1970 through 1995.   You also might notice the map turns redder as time progresses, suggesting the lung cancer mortality in white males is increasing. 
  2. These are a spatial time series since the data are recorded in five year periods, and we can inspect the time series using the time plot tool .  I chose the variable “WLUNG”.

WLUNG

Inspection of the time plot confirms our observation that lung cancer mortality overall is increasing through time.  Each line on the time plot is the lung cancer mortality rate in a given SEA.  You can click the lines in the time plot to identify the SEA on the map to which they correspond. 

3.  Do you think the two SEA’s with very low rates in 1970 had low rates that persisted through time, or did those rates bounce around a lot perhaps due to the small numbers problem?  To answer this question I brush select the two lowest rates on the scatterplot at 1970 (below).

 White Population vs WLUNG

Not only do those two SEA’s have the lowest rates in 1970, but those rates are persistently low over the next 25 years, all the way to 1995.  

4.  What might explain this persistence of low rates?  To address this question I inspect the map, and right click on the selected areas to see where they are and who might live there (below).   The SEA’s under consideration are Logan and Provo, Utah, which have large Mormon populations that do not smoke.  Based on this, do you think the low mortality rates in these two SEA’s might most reasonably be attributed to behavioral factors or to the small numbers problem?

This blog I hope has illustrated how we can treat spatial time series as repeated experiments to evaluate whether extreme rates (whether high or low) persist through time in local populations.  The health analyst now has at their disposal highly sophisticated tools for rate stabilization that typically involve local smoothing of rates and/or “shrinkage to the mean”.  This includes popular techniques such as kernel smoothing and Bayesian rate stabilization.  Be aware, however, that these methods can “smooth out” truly high (or low) rates.  Evaluating persistence in spatial time series avoids this problem.

The small numbers problem: What you see is not necessarily what you get.

11.4.10

The ability to quickly create maps of health outcomes such as cancer incidence and mortality in counties, census areas and even Zip codes is now available through websites and data portals. (See for example Atlasplus, State Cancer Profiles, and Cardiovascular Disease, and many others.)

Often these maps appear with measures of the “uncertainty” in the underlying rate estimates, which, for example, might be the colon cancer incidence in Washtenaw County, Michigan.  Sometimes all we see are the maps themselves, and the eye is immediately and quite naturally drawn to areas with extremely high rates.   If you live there you are concerned, if you are a public health professional you might decide to evaluate the high rate, or put in place a health intervention to expand screening and access to care in that local area.  But is the rate (e.g. colon cancer incidence in a given county) truly high, or might it be explained by the “small numbers problem”?  What is the small numbers  problem, and why should you care about it?

Rates are calculated from a numerator, such as the number of incident colon cancer cases; and a denominator, which here would be the population at risk (e.g. adults) for colon cancer.  The rate is calculated by dividing the numerator by the denominator, and here is where the small numbers problem arises.  It turns out that the variance in the rate depends critically on the size of the denominator.  When the denominator is small, variance in the rate is high, when the denominator is large, variance in the rate is small.  Hence the appearance of an apparently large rate might be due entirely or in part to the small numbers problem, and the true, underlying risk might be entirely unremarkable.

So how can we quickly evaluate whether we need to worry about the small numbers problem?  I like to use a simple protocol in SpaceStat™.

Step 1. Create a map of the rate.

Step 2. Create a scatterplot of the rate (on the x-axis) and the population at risk (on the y-axis).

Step 3. Inspect the scatterplot for the “Greater Than” signature  (e.g. “>”) such that variance in the rate is larger at small population sizes.

Step 4.  Brush select on the scatterplot to see where the areas with high rates and low population sizes appear on the map.  These are the places with apparent high rates that are unstable due to the small numbers problem.  You don’t have much confidence that these rates really are high!

Let’s see how this works.  As an example, I used SpaceStat™ to explore the small numbers problem for lung cancer mortality in white males in state economic areas (SEA’s) in the 1970’s.  I transformed the white male population size using the square root since the population distribution has a long right tail, as is typical of study areas that include rural and urban places.  I then plotted lung cancer mortality as a function of the square root of population size (Figure 1, left).  Do you see the “greater than” signature (e.g. “>”) that indicates variance in lung cancer mortality is much higher at smaller population sizes?  I then brush selected on the scatterplot to identify areas with high lung cancer mortality rate and low population size, these are now circumscribed with heavy gold borders on the map.  Do you have much confidence that the areas outlined in gold have a true excess of lung cancer mortality, or do you think it might be explained by the small numbers problem?  Clearly, the variance in the rate is high in these areas, and the high rate we observe might be due entirely to small numbers.

Figure 1 Lung Cancer
Figure 1 Lung Cancer

A future post will describe what to do in SpaceStat™ to take the small numbers problem into account in order to accurately find areas of high risk.

Monthly Archives

Featured

SpaceStat is a comprehensive software package that enables visualization and analysis of space–time data, overcoming some of the constraints inherent to spatial-only Geographic Information System software.

Read More About our Software »

Methods

Time is an integral part of our analysis. All views of the data can be animated, from maps to histograms to scatter plots. Furthermore, these animations can be linked together to explore ideas and discover patterns.

Learn More About Our Methods »

Order Books

Compartmental Analysis in Biology and Medicine, 3rd Edition, by John A. Jacquez and Modeling with Compartments, by John A. Jacquez

Order Now »

  • 01Research
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact

© 2018 BioMedware | P.O. Box 1577, Ann Arbor, MI 48106 | Phone & Fax: (734) 913-1098
Privacy Policy