Reconstructing Problem File Analyses

If your problem file contained Spatial ANOVA, trend surface regression, spatial regimes, or spatial expansion analyses, you will receive a message in the Log View that further manipulation is needed to fully reconstruct these analyses. Here we discuss how the user can carry out these methods in the new SpaceStat, built upon our space-time platform.

Spatial ANOVA

The spatial analysis of variance (ANOVA) method carries out regression of a continuous dependent variable against one or more categorical datasets (i.e. levels typically describing different possible states of a particular variable). As in conventional ANOVA, the individual levels of a particular variable are modeled by values of 0 or 1 according to whether an observation has a particular category as its value for that dataset. For example suppose that "Country of origin" is a regression independent variable with three possible categories, France, U.S. or German. Then there will be three regression variables in any regression model . For someone of French descent the France variable will take on a value of 1 while the German and U.S. variables will be zero.

Because this scheme leads to redundant variables, regression calculations require one of the categories to be designated as the "reference" category. The estimated value for this category is the same as that of the intercept. Since most data imported into SpaceStat is either integer or decimal, categorical variables have to be created out of these (usually integer) values. In the old version of SpaceStat this was done automatically behind the scenes whenever the ANOVA option (setting "5" for flag #1 in the problem file) was invoked.

In SpaceStat, the creation of categorical data from integer data is done by right-clicking on a dataset in the dataview and selecting the "convert data type" option. A default name will appear for the new dataset with the possible choices of "Integer" and "String" for "New data type". Selecting the "String" option will create a categorical dataset with the new name. On opening up any regression tab, select a geography, then click on "Create". The new categorical dataset will appear in the list for independent variables. It can then be include in any regression model.

Spatial regimes

A straightforward extension of the use of categorical datasets is found in the Spatial regimes method which allows for the fact that different regression models may be applicable to different regions of the geography. To estimate the regression coefficients in this model, start with a list of global independent explanatory variables that are valid over the entire geography. Next, decide on a variable that would be the distinguishing indicator for different spatial regions. In the earlier version of SpaceStat this had to be an integer but in the new version of SpaceStat any categorical variable will do (for example "Country of origin" as in the previous section).

The user then has to create by hand, using the regression model dialog, a regression model that contains (1) the regional indicator variable as a linear term, with the reference category chosen appropriately (2) for each of the global explanatory variables, create an interaction term between that global variable and the regional indicator variable (select the global explanatory term, hold down the control key then select the regional indicator variable). Do not include the global variable on its own as a linear term in the model. Once the regression model is set up it can be saved, spatial weight datasets can be specified and any one of the three regression types (OLS, spatial lag and spatial error) can be invoked.

The new version of SpaceStat does not output tests for structural instability or tests for the stability of individual coefficients. In addition the Breusch-Pagan tests in the new version are against the squares of the original regression terms.

Trend surface regression

The regression variables in this case are the spatial coordinates of the observations themselves. For point geographies, the spatial coordinates usually appear in the data view. For polygon geographies, we can create these datasets by using the "Create centroid geography" option. This option can be found by right-clicking on the geography and select "Create centroid geography", or by locating the option under the Tools menu. Deselect the "Include parent geographies’ datasets" option on the dialog that appears and select OK. A new centroid geography will be created under the parent geography. The datasets "X" and "Y" denote the spatial coordinates of these centroids. They may be dragged up to the parent geography to appear in that geography’s dataset list.

Trend surface regression requires using the "X" and "Y" and powers of these as regression variables. The higher powers have to be created manually using the dataset calculator. Under the "Tools" menu select "Derive new dataset". Select the geography, assign a name, then from the "Insert dataset" dropdown, double click on "X" or "Y" and it will appear in the window as $("X"). Follow that with a multiplicative * from the "Insert function" dropdown and you can build up powers of the "X" or "Y" variables explicitly as well as interactions between them. These new variables can then be saved under the original geography and used in the regression model dialog. The resulting regression models can be saved as part of the project.

In the new version, the Breusch-Pagan tests are against the squares of the original regression terms.

Spatial expansion

This approach models regression coefficients that vary over space by including regression variables that are made up of interactions of non-spatial regression variables with powers of "X" and "Y". For example the dependence of crime on income would require independent variables Income, Income*X, Income*Y as well as higher powers possibly. These models would require creating "X" and "Y" variables explicitly following the procedure indicated above under trend surface regression and then forming the interactions of these powers of "X" and "Y" with other regression variables in the regression dialog.

The new version does not output tests on the expansion of the coefficients or tests on the expansion of individual coefficients. In addition the Breusch-Pagan tests in the new version are against the squares of the original regression terms.