Menu
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact
  • 01Research
    • Research Interests
    • Current Projects
      • Current Projects – METRIC Software
      • Current Projects – Non-destructive Detection of Pipeline Properties
      • Current Projects – Cryptography for confidentiality protection
      • Current Projects – Geostatistical Boundary Analysis
      • Current Projects – I-HEAT
      • Current Projects – Geostatistical software for space-time interpolation
    • Completed Projects
      • Completed Projects – Boundary Analysis
      • Completed Projects – Modeling of College Drinking
      • Completed Projects – Cancer Cluster Morphology
      • Completed Projects – Geostatistical software for the analysis of individual-level epidemiologic data
      • Completed Projects – Case-only Cancer Clustering for Mobile Populations
      • Completed Projects – Contextual Mapping of Space-Time Cancer Data
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
    • SpaceStat
    • SpaceStat Help & Tutorials
    • Overview
    • ClusterSeer
      • ClusterSeer Help and Tutorials
      • Overview
    • BoundarySeer
      • BoundarySeer Help and Tutorials
      • Overview
    • Support
    • Network License
    • Purchase
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact
  • 01Research
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
Subnav Menu

cancer mortality

Thoughts from Austin: NAACCR Annual Meeting

6.12.13

This week I am attending the meetings of the North American Association of Central Cancer Registries that is being held in Austin, Texas. The topic of this year’s conference is “Thinking big, the future of cancer surveillance”, and I’m involved in two activities. The first was a series of workshops that occurred on Saturday and Sunday titled “Evaluation of Homomorphic Cryptography for Geospatial Studies with Human Subjects”. This workshop was convened as part of a grant funded by National Library of Medicine that is evaluating the feasibility of using homomorphic cryptography to accelerate the pace of research and discovery for studies that use human subjects data. “Homomorphic” means mathematical operations can be conducted on encrypted data (e.g. in the encrypted space), greatly reducing the risk to privacy of confidential data.

My co-organizer, Dr. Khaled El Emam of Privacy Analytics and the University of Ottawa e-health laboratory were very happy with the recommendations that came out of the working group. These are being written up as a BioMedware report to the National Library of Medicine, and will be available in our Publications when they are ready. But here is a preview of some of the “low-hanging fruit” that homomorphic cryptography may make possible.

First, increased data security greatly enhances data sharing, and hence participation in all manner of activities where data sharing plays an important role. It turns out a key bugaboo in the processing of disease registry data is deduplication; the removal of duplicate data records that may appear in several data bases. This arises, for example, when snowbirds flit between Michigan and Florida, yet have records of cancer tumor treatment in both States. The data providers must be very satisfied that the potential for unintentional release of their highly confidential patient records is absolutely minimal, meaning, in practice, that two data providers may be reluctant to share data to search for record duplicates. Homomorphic encryption solves this by having deduplication take place in the encrypted space – hence even if the data security is breached the records appear as complete gibberish.

Second, increased data sharing means data aggregation across data providers becomes far less of a concern. Hence activities that involve pooling data, such as determining the number of cases anticipated in projected enrollment reports for NIH grant applications, suddenly becomes very easy.
Other opportunities were identified – keep checking back for our release of the workshop report!

The small numbers problem: What you see is not necessarily what you get.

11.4.10

The ability to quickly create maps of health outcomes such as cancer incidence and mortality in counties, census areas and even Zip codes is now available through websites and data portals. (See for example Atlasplus, State Cancer Profiles, and Cardiovascular Disease, and many others.)

Often these maps appear with measures of the “uncertainty” in the underlying rate estimates, which, for example, might be the colon cancer incidence in Washtenaw County, Michigan.  Sometimes all we see are the maps themselves, and the eye is immediately and quite naturally drawn to areas with extremely high rates.   If you live there you are concerned, if you are a public health professional you might decide to evaluate the high rate, or put in place a health intervention to expand screening and access to care in that local area.  But is the rate (e.g. colon cancer incidence in a given county) truly high, or might it be explained by the “small numbers problem”?  What is the small numbers  problem, and why should you care about it?

Rates are calculated from a numerator, such as the number of incident colon cancer cases; and a denominator, which here would be the population at risk (e.g. adults) for colon cancer.  The rate is calculated by dividing the numerator by the denominator, and here is where the small numbers problem arises.  It turns out that the variance in the rate depends critically on the size of the denominator.  When the denominator is small, variance in the rate is high, when the denominator is large, variance in the rate is small.  Hence the appearance of an apparently large rate might be due entirely or in part to the small numbers problem, and the true, underlying risk might be entirely unremarkable.

So how can we quickly evaluate whether we need to worry about the small numbers problem?  I like to use a simple protocol in SpaceStat™.

Step 1. Create a map of the rate.

Step 2. Create a scatterplot of the rate (on the x-axis) and the population at risk (on the y-axis).

Step 3. Inspect the scatterplot for the “Greater Than” signature  (e.g. “>”) such that variance in the rate is larger at small population sizes.

Step 4.  Brush select on the scatterplot to see where the areas with high rates and low population sizes appear on the map.  These are the places with apparent high rates that are unstable due to the small numbers problem.  You don’t have much confidence that these rates really are high!

Let’s see how this works.  As an example, I used SpaceStat™ to explore the small numbers problem for lung cancer mortality in white males in state economic areas (SEA’s) in the 1970’s.  I transformed the white male population size using the square root since the population distribution has a long right tail, as is typical of study areas that include rural and urban places.  I then plotted lung cancer mortality as a function of the square root of population size (Figure 1, left).  Do you see the “greater than” signature (e.g. “>”) that indicates variance in lung cancer mortality is much higher at smaller population sizes?  I then brush selected on the scatterplot to identify areas with high lung cancer mortality rate and low population size, these are now circumscribed with heavy gold borders on the map.  Do you have much confidence that the areas outlined in gold have a true excess of lung cancer mortality, or do you think it might be explained by the small numbers problem?  Clearly, the variance in the rate is high in these areas, and the high rate we observe might be due entirely to small numbers.

Figure 1 Lung Cancer
Figure 1 Lung Cancer

A future post will describe what to do in SpaceStat™ to take the small numbers problem into account in order to accurately find areas of high risk.

Monthly Archives

Featured

SpaceStat is a comprehensive software package that enables visualization and analysis of space–time data, overcoming some of the constraints inherent to spatial-only Geographic Information System software.

Read More About our Software »

Methods

Time is an integral part of our analysis. All views of the data can be animated, from maps to histograms to scatter plots. Furthermore, these animations can be linked together to explore ideas and discover patterns.

Learn More About Our Methods »

Order Books

Compartmental Analysis in Biology and Medicine, 3rd Edition, by John A. Jacquez and Modeling with Compartments, by John A. Jacquez

Order Now »

  • 01Research
  • 02Tutorials
  • 03Publications
  • 04Collaborations
  • 05Software
  • Home
  • About
  • Blog/News
  • Calendar
  • Purchase
  • Contact

© 2018 BioMedware | P.O. Box 1577, Ann Arbor, MI 48106 | Phone & Fax: (734) 913-1098
Privacy Policy