com.esri.arcgis.geoprocessing.tools.spatialstatisticstools
Class GeographicallyWeightedRegression
java.lang.Object
com.esri.arcgis.geoprocessing.AbstractGPTool
com.esri.arcgis.geoprocessing.tools.spatialstatisticstools.GeographicallyWeightedRegression
- All Implemented Interfaces:
- GPTool
public class GeographicallyWeightedRegression
- extends AbstractGPTool
Performs Geographically Weighted Regression (GWR), a local form of linear regression used to model spatially varying relationships.
The Geographically Weighted Regression tool is contained in the Spatial Statistics Tools tool box.
Usage tips:
- GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth is dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation.
- GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets. The tool does not work with multipoint data.
- The GWR tool also produces an Output feature class and a table with the tool execution summary report diagnostic values. The name of this table is automatically generated using the output feature class name and "_supp" suffix. The Output feature class is automatically added to the table of contents with a hot/cold rendering scheme applied to model . A full explanation of each output is provided in .
- Using projected data is always recommended; it is especially important whenever distance is a component of the analysis, as it is for GWR when you selectfor Kernel type. It is strongly recommended that your data is projected using a (rather than a ).
- Some of the computations done by the GWR tool take advantage of multiple CPUs in order to increase performance and will automatically use up to 8 threads/CPUs for processing.
- You should always begin regression analysis with regression. First find a , then use the same explanatory variables to run GWR (excluding any "dummy" explanatory variables representing different spatial regimes).
- Dependent and Explanatory variables should be numeric fields containing a variety of values. Linear regression methods, like GWR, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0).
- In global regression models, such as , results are unreliable when two or more variables exhibit multicollinearity (when two or more variables are redundant or together tell the same "story"). GWR builds a local regression equation for each feature in the dataset. When the values for a particular explanatory variable cluster spatially, you will very likely have problems with local . Thein the output feature class indicates when results are unstable due to local multicollinearity. As a rule of thumb, do not trust results for features with a condition number larger than 30, equal to Null or, for shapefiles, equal to -1.7976931348623158e+308.
- Caution should be used when including nominal/categorical data in a GWR model. Where categories cluster spatially, there is strong risk of encountering local multicollinearity issues. The condition number included in the GWR output indicates when local collinearity is a problem (a condition number less than zero, greater than 30, or set to Null). Results in the presence of local multicollinearity are unstable.
- Do not use "dummy" explanatory variables to represent different spatial regimes in a GWR model (e.g., census tracts outside the urban core are assigned a value of 1, while all others are assigned a value of 0). Because GWR allows explanatory variable coefficients to vary, these spatial regime explanatory variables are unnecessary, and if included, will create problems with local .
- To better understand regional variation among the coefficients of your explanatory variables, examine the optional raster coefficient surfaces created by GWR. These raster surfaces are created in the Coefficient raster workspace, if you specify one. For polygon data, you can use graduated color or cold-to-hot rendering on each coefficient field in the Output feature class to examine changes across your study area.
- You may use GWR for prediction by supplying a Predictions locations feature class (often this feature class is the same as the Input feature class), the Prediction explanatory variables, and an Output prediction feature class. There must be a one to one correspondence between the fields used to calibrate the regression model (the values entered for the Explanatory variables field) and the fields used for prediction (the values entered for the Prediction explanatory variables field). The order of these variables must be the same. Suppose, for example, you are modeling traffic accidents as a function of speed limits, road conditions, number of lanes, and number of cars. You can predict the impact that changing speed limits or improving roads might have on traffic accidents by creating a new variables with the amended speed limits and road conditions. The existing variables would be used to calibrate the regression model and would be used for the Explanatory variables parameter. The amended variables would be used for predictions and would be entered as your Prediction explanatory variables.
- If a Prediction locations feature class is provided, but no Prediction explanatory variables are specified, the Output prediction feature class is created with computed coefficients for each location only (no predictions).
- A regression model is misspecified if it is missing a key explanatory variable. Statistically significant spatial autocorrelation of the regression residuals and/or unexpected spatial variation among the coefficients of one or more explanatory variables suggests that your model is misspecified. You should make every effort (through OLS residual analysis and GWR coefficient variation analysis, for example) to discover what these key missing variables are so they may be included in the model.
- Always question whether or not it makes sense for an explanatory variable to be nonstationary. For example, suppose you are modeling the density of a particular plant species as a function of several variables including ASPECT. If you find that the coefficient for the ASPECT variable changes across the study area, you are likely seeing evidence of a key missing explanatory variable (perhaps prevalence of competing vegetation, for example). You should make every effort to include all key explanatory variables in your regression model.
- Whenever using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from nonshapefile inputs may, consequently, store null values as zero or as some very small negative number (-DBL_MAX = -1.7976931348623158e+308). This can lead to unexpected results. See also: .
- When the result of a computation is infinity or undefined, the result for nonshapefiles will be Null; for shapefiles the result will be -DBL_MAX = -1.7976931348623158e+308.
- When you select either the AICc or CV Bandwidth Method, GWR will find the optimal distance (for FIXED kernel) or optimal number of neighbors (for ADAPTIVE kernel). Problems with local multicollinearity, however, will prevent both the AICc and CV Bandwidth methods from resolving an optimal distance/number of neighbors. If you get an error indicatingproblems, try specifying a particular distance or neighbor count, then examining the condition numbers in the output feature class to see which features are associated with local collinearity problems
- Problems with local collinearity will prevent both the AICc and CV Bandwidth methods from resolving an optimal distance/number of neighbors. If you get an error indicating severe model design problems, try specifying a particular distance or neighbor count, then examining the condition numbers in the Output feature class to see which features are associated with local multicollinearity problems.
- errors, or errors indicating local equations do not include enough neighbors, often indicate a problem with global or local multicollinearity. To determine where the problem is, run your model using and examine the for each explanatory variable. If some of the VIF values are large (above 7.5, for example), global multicollinearity is preventing GWR from solving. More likely, however, local multicollinearity is the problem. Try creating a thematic map for each explanatory variable. If the map reveals spatial clustering of identical values, consider removing those variables from the model or combining those variables with other explanatory variables in order to increase value variation. If, for example, you are modeling home values and have variables for both bedrooms and bathrooms, you may want to combine these to increase value variation, or to represent them as bathroom/bedroom square footage. Avoid using spatial regime dummy variables, spatially clustering categorical/nominal variables, or variables with very few possible values when constructing GWR models.
- GWR is a linear model subject to the same requirements as OLS. Review the section titled "" in the document as a check that your GWR model is properly specified.
GeographicallyWeightedRegression
public GeographicallyWeightedRegression()
- Creates the Geographically Weighted Regression tool with defaults.
Initializes the array of tool parameters with the default values specified when the tool was created.
GeographicallyWeightedRegression
public GeographicallyWeightedRegression(Object inFeatures,
Object dependentField,
Object explanatoryField,
Object outFeatureclass,
String kernelType,
String bandwidthMethod)
- Creates the Geographically Weighted Regression tool with the required parameters.
Initializes the array of tool parameters with the values as specified for the required parameters and with the default values for the other parameters.
- Parameters:
inFeatures
- the feature class containing the dependent and independent variables.dependentField
- the numeric field containing values for what you are trying to model.explanatoryField
- a list of fields representing independent explanatory variables in your regression model.outFeatureclass
- the output feature class to receive dependent variable estimates and residuals.kernelType
- specifies if the kernel is constructed as a fixed distance, or if it is allowed to vary in extent as a function of feature density.bandwidthMethod
- null
getInFeatures
public Object getInFeatures()
- Returns the Input feature class parameter of this tool .
This parameter is the feature class containing the dependent and independent variables.
This is a required parameter.
- Returns:
- the Input feature class
setInFeatures
public void setInFeatures(Object inFeatures)
- Sets the Input feature class parameter of this tool .
This parameter is the feature class containing the dependent and independent variables.
This is a required parameter.
- Parameters:
inFeatures
- the feature class containing the dependent and independent variables.
getDependentField
public Object getDependentField()
- Returns the Dependent variable parameter of this tool .
This parameter is the numeric field containing values for what you are trying to model.
This is a required parameter.
- Returns:
- the Dependent variable
setDependentField
public void setDependentField(Object dependentField)
- Sets the Dependent variable parameter of this tool .
This parameter is the numeric field containing values for what you are trying to model.
This is a required parameter.
- Parameters:
dependentField
- the numeric field containing values for what you are trying to model.
getExplanatoryField
public Object getExplanatoryField()
- Returns the Explanatory variable(s) parameter of this tool .
This parameter is a list of fields representing independent explanatory variables in your regression model.
This is a required parameter.
- Returns:
- the Explanatory variable(s)
setExplanatoryField
public void setExplanatoryField(Object explanatoryField)
- Sets the Explanatory variable(s) parameter of this tool .
This parameter is a list of fields representing independent explanatory variables in your regression model.
This is a required parameter.
- Parameters:
explanatoryField
- a list of fields representing independent explanatory variables in your regression model.
getOutFeatureclass
public Object getOutFeatureclass()
- Returns the Output feature class parameter of this tool .
This parameter is the output feature class to receive dependent variable estimates and residuals.
This is a required parameter.
- Returns:
- the Output feature class
setOutFeatureclass
public void setOutFeatureclass(Object outFeatureclass)
- Sets the Output feature class parameter of this tool .
This parameter is the output feature class to receive dependent variable estimates and residuals.
This is a required parameter.
- Parameters:
outFeatureclass
- the output feature class to receive dependent variable estimates and residuals.
getKernelType
public String getKernelType()
- Returns the Kernel type parameter of this tool .
This parameter is specifies if the kernel is constructed as a fixed distance, or if it is allowed to vary in extent as a function of feature density.
This is a required parameter.
- Returns:
- the Kernel type
setKernelType
public void setKernelType(String kernelType)
- Sets the Kernel type parameter of this tool .
This parameter is specifies if the kernel is constructed as a fixed distance, or if it is allowed to vary in extent as a function of feature density.
This is a required parameter.
- Parameters:
kernelType
- specifies if the kernel is constructed as a fixed distance, or if it is allowed to vary in extent as a function of feature density.
getBandwidthMethod
public String getBandwidthMethod()
- Returns the Bandwidth method parameter of this tool .
This is a required parameter.
- Returns:
- the Bandwidth method
setBandwidthMethod
public void setBandwidthMethod(String bandwidthMethod)
- Sets the Bandwidth method parameter of this tool .
This is a required parameter.
- Parameters:
bandwidthMethod
- null
getDistance
public double getDistance()
- Returns the Distance parameter of this tool .
This is an optional parameter.
- Returns:
- the Distance
setDistance
public void setDistance(double distance)
- Sets the Distance parameter of this tool .
This is an optional parameter.
- Parameters:
distance
- null
getNumberOfNeighbors
public int getNumberOfNeighbors()
- Returns the Number of neighbors parameter of this tool .
This parameter is an integer reflecting the exact number of neighbors to include in the local bandwidth of the Gaussian kernel whenever the kernel type is ADAPTIVE and the bandwidth method is BANDWIDTH PARAMETER.
This is an optional parameter.
- Returns:
- the Number of neighbors
setNumberOfNeighbors
public void setNumberOfNeighbors(int numberOfNeighbors)
- Sets the Number of neighbors parameter of this tool .
This parameter is an integer reflecting the exact number of neighbors to include in the local bandwidth of the Gaussian kernel whenever the kernel type is ADAPTIVE and the bandwidth method is BANDWIDTH PARAMETER.
This is an optional parameter.
- Parameters:
numberOfNeighbors
- an integer reflecting the exact number of neighbors to include in the local bandwidth of the Gaussian kernel whenever the kernel type is ADAPTIVE and the bandwidth method is BANDWIDTH PARAMETER.
getWeightField
public Object getWeightField()
- Returns the Weights parameter of this tool .
This parameter is the numeric field containing a spatial weighting for individual features. This weight field allows some features to be more important in the model calibration process than others. Primarily useful when the number of samples taken at different locations varies, values for the dependent and independent variables are averaged, and places with more samples are more reliable (should be weighted higher). If you have an average of 25 different samples for one location, but an average of only 2 samples for another location, you can use the number of samples as your weight field so that locations with more samples have a larger influence on model calibration that locations with few samples.
This is an optional parameter.
- Returns:
- the Weights
setWeightField
public void setWeightField(Object weightField)
- Sets the Weights parameter of this tool .
This parameter is the numeric field containing a spatial weighting for individual features. This weight field allows some features to be more important in the model calibration process than others. Primarily useful when the number of samples taken at different locations varies, values for the dependent and independent variables are averaged, and places with more samples are more reliable (should be weighted higher). If you have an average of 25 different samples for one location, but an average of only 2 samples for another location, you can use the number of samples as your weight field so that locations with more samples have a larger influence on model calibration that locations with few samples.
This is an optional parameter.
- Parameters:
weightField
- the numeric field containing a spatial weighting for individual features. This weight field allows some features to be more important in the model calibration process than others. Primarily useful when the number of samples taken at different locations varies, values for the dependent and independent variables are averaged, and places with more samples are more reliable (should be weighted higher). If you have an average of 25 different samples for one location, but an average of only 2 samples for another location, you can use the number of samples as your weight field so that locations with more samples have a larger influence on model calibration that locations with few samples.
getCoefficientRasterWorkspace
public Object getCoefficientRasterWorkspace()
- Returns the Coefficient raster workspace parameter of this tool .
This is an optional parameter.
- Returns:
- the Coefficient raster workspace
setCoefficientRasterWorkspace
public void setCoefficientRasterWorkspace(Object coefficientRasterWorkspace)
- Sets the Coefficient raster workspace parameter of this tool .
This is an optional parameter.
- Parameters:
coefficientRasterWorkspace
- null
getCellSize
public Object getCellSize()
- Returns the Output cell size parameter of this tool .
This parameter is the cell size (a number) or reference to the cell size (a pathname to a raster dataset) to use when creating the coefficient rasters. the default cell size is the shortest of the width or height of the extent specified in the Environment output coordinate system, divided by 250.
This is an optional parameter.
- Returns:
- the Output cell size
setCellSize
public void setCellSize(Object cellSize)
- Sets the Output cell size parameter of this tool .
This parameter is the cell size (a number) or reference to the cell size (a pathname to a raster dataset) to use when creating the coefficient rasters. the default cell size is the shortest of the width or height of the extent specified in the Environment output coordinate system, divided by 250.
This is an optional parameter.
- Parameters:
cellSize
- the cell size (a number) or reference to the cell size (a pathname to a raster dataset) to use when creating the coefficient rasters. the default cell size is the shortest of the width or height of the extent specified in the Environment output coordinate system, divided by 250.
getInPredictionLocations
public Object getInPredictionLocations()
- Returns the Prediction locations parameter of this tool .
This parameter is a feature class containing features representing locations where estimates should be computed. Each feature in this dataset should contain values for all of the explanatory variables specified; the dependent variable for these features will be estimated using the model calibrated for the input feature class data.
This is an optional parameter.
- Returns:
- the Prediction locations
setInPredictionLocations
public void setInPredictionLocations(Object inPredictionLocations)
- Sets the Prediction locations parameter of this tool .
This parameter is a feature class containing features representing locations where estimates should be computed. Each feature in this dataset should contain values for all of the explanatory variables specified; the dependent variable for these features will be estimated using the model calibrated for the input feature class data.
This is an optional parameter.
- Parameters:
inPredictionLocations
- a feature class containing features representing locations where estimates should be computed. Each feature in this dataset should contain values for all of the explanatory variables specified; the dependent variable for these features will be estimated using the model calibrated for the input feature class data.
getPredictionExplanatoryField
public Object getPredictionExplanatoryField()
- Returns the Prediction explanatory variable(s) parameter of this tool .
This parameter is a list of fields representing explanatory variables in the Prediction Locations feature class. These field names should be provided in the same order (a one-to-one correspondance) as those listed for the input feature class Explanatory variables parameter. If no prediction explanatory variables are given, the output prediction feature class will only contain computed coefficient values for each prediction location.
This is an optional parameter.
- Returns:
- the Prediction explanatory variable(s)
setPredictionExplanatoryField
public void setPredictionExplanatoryField(Object predictionExplanatoryField)
- Sets the Prediction explanatory variable(s) parameter of this tool .
This parameter is a list of fields representing explanatory variables in the Prediction Locations feature class. These field names should be provided in the same order (a one-to-one correspondance) as those listed for the input feature class Explanatory variables parameter. If no prediction explanatory variables are given, the output prediction feature class will only contain computed coefficient values for each prediction location.
This is an optional parameter.
- Parameters:
predictionExplanatoryField
- a list of fields representing explanatory variables in the Prediction Locations feature class. These field names should be provided in the same order (a one-to-one correspondance) as those listed for the input feature class Explanatory variables parameter. If no prediction explanatory variables are given, the output prediction feature class will only contain computed coefficient values for each prediction location.
getOutPredictionFeatureclass
public Object getOutPredictionFeatureclass()
- Returns the Output prediction feature class parameter of this tool .
This is an optional parameter.
- Returns:
- the Output prediction feature class
setOutPredictionFeatureclass
public void setOutPredictionFeatureclass(Object outPredictionFeatureclass)
- Sets the Output prediction feature class parameter of this tool .
This is an optional parameter.
- Parameters:
outPredictionFeatureclass
- null
getOutTable
public Object getOutTable()
- Returns the Output table parameter of this tool (Read only).
This is an derived parameter.
- Returns:
- the Output table
getOutRegressionRasters
public Object getOutRegressionRasters()
- Returns the Output regression rasters parameter of this tool (Read only).
This is an derived parameter.
- Returns:
- the Output regression rasters
getToolName
public String getToolName()
- Returns the name of this tool.
- Returns:
- the tool name
getToolboxName
public String getToolboxName()
- Returns the name of the tool box containing this tool.
- Returns:
- the tool box name
getToolboxAlias
public String getToolboxAlias()
- Returns the alias of the tool box containing this tool.
- Returns:
- the tool box alias