Missing data

With many remotely sensed files, pixels and/or entire regions can be recorded as 'no data' using a no-data or missing value code. In other data sets, such a code might be used to indicate that the variable was not measurable at a location. BoundarySeer incorporates the use of such codes in its algorithms.

Choosing a missing value code

The missing value code should be a value that could not possibly show up as a true data value in the data set. Often, codes such as "-9999" are used so that the code is easy to recognize when you scan a column of data. Any integer value can be used, including negative numbers. Currently, decimal values and text strings (such as "no data") cannot be used.

Missing data in boundary detection

With multivariate data sets, BoundarySeer calculates gradients and distance metrics using only those variables that have no missing values for all locations involved. This means that locations with missing data will not be included in boundary or cluster detection. In cluster output, missing value locations will be "holes" in the new polygon dataset created. In difference boundary detection, locations with missing data will not be included in gradient calculations or edge detection. You may wish to do single variable boundary detection for those multivariate datasets with lots of missing values.

If a gradient or metric cannot be calculated because all variables have at least one missing value for the locations involved, BoundarySeer will report the missing value code as the metric (e.g. -9999). Further, when randomizing for Monte Carlo procedures, BoundarySeer will not include those locations with every variable coded as missing or 'no data'.


See also: