Incremental Spatial Autocorrelation (Spatial Statistics)
Summary
Measures spatial autocorrelation for a series of distances and optionally creates a line graph of those distances and their corresponding z-scores. Z-scores reflect the intensity of spatial clustering, and statistically significant peak z-scores indicate distances where spatial processes promoting clustering are most pronounced. These peak distances are often appropriate values to use for tools with a Distance Band or Distance Radius parameter.
Illustration
Usage
This tool can help you select an appropriate Distance Threshold or Radius for tools that have these parameters, such as Hot Spot Analysis or Point Density.
The Incremental Spatial Autocorrelation tool measures spatial autocorrelation for a series of distance increments and reports, for each distance increment, the associated Moran's Index, Expected Index, Variance, z-score and p-value. These values are accessible from the Results window by right-clicking on the Messages entry and selecting View. The tool also passes, as derived output, the first peak z-score and maximum peak z-score for potential use in models or scripts (see, for example, the sample script below).
-
When more than one statistically significant peak is present, clustering is pronounced at each of those distances. Select the peak distance that best corresponds to the scale of analysis you are interested in; often this is the first statistically significant peak encountered.
The Input Field should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. If you want to use this tool to analyze the spatial pattern of incident data, consider aggregating your incident data.
When the Input Feature Class is not projected (that is, when coordinates are given in degrees, minutes, and seconds) or when the output coordinate system is set to a Geographic Coordinate System, distances are computed using chordal measurements. Chordal distance measurements are used because they can be computed quickly and provide very good estimates of true geodesic distances, at least for points within about thirty degrees of each other. Chordal distances are based on a sphere rather than the true oblate ellipsoid shape of the earth. Given any two points on the earth's surface, the chordal distance between them is the length of a line, passing through the three dimensional earth, to connect those two points. Chordal distances are reported in meters.
Caution:Be sure to project your data if your study area extends beyond 30 degrees. Chordal distances are not a good estimate of geodesic distances beyond 30 degrees.
When chordal distances are used in the analysis, the Beginning Distance and Distance Increment parameters, if specified, should be given in meters.
Prior to ArcGIS 10.2.1, you would see a warning message if the parameters and environment settings you selected would result in calculations being performed using Geographic Coordinates (degrees, minutes, seconds). This warning advised you to project your data into a Projected Coordinate System so that distance calculations would be accurate. Beginning at 10.2.1, however, this tool calculates chordal distances whenever Geographic Coordinate System calculations are required.
Caution:Because of this change, there is a small chance that you will need to modify models that incorporate this tool if your models were created prior to ArcGIS 10.2.1 and if your models include hard-coded Geographic Coordinate System parameter values. If, for example, a distance parameter is set to something like 0.0025 degrees, you will need to convert that fixed value from degrees to meters and resave your model.
-
For line and polygon features, feature centroids are used in distance computations. For multipoints, polylines, or polygons with multiple parts, the centroid is computed using the weighted mean center of all feature parts. The weighting for point features is 1, for line features is length, and for polygon features is area.
-
Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.
For polygon features, you will almost always want to choose ROW for the Row Standardization parameter. Row Standardization mitigates bias when the number of neighbors each feature has is a function of the aggregation scheme or sampling process, rather than reflecting the actual spatial distribution of the variable you are analyzing.
If no Beginning Distance is given, the default value is the minimum distance for which each feature in the dataset has at least one neighbor. This may not be the most appropriate beginning distance if your dataset includes locational outliers.
If no Increment Distance is given, the smaller of either the average nearest neighbor distance or (Td - B) / I is used, where Td is a maximum threshold distance, B is the Beginning Distance and I is the Number of Distance Bands. This algorithm ensures calculations will always be performed for the Number of Distance Bands specified and that the largest distance bands won't be so large that some features have all or almost all other features as neighbors.
If the Beginning Distance and/or Increment Distance specified will result in a distance band that is larger than the maximum threshold distance, the Increment Distance will automatically be scaled down. To avoid this adjustment you can decrease the Increment Distance and/or decrease the Number of Distance Bands specified.
It is possible to run out of memory when you run this tool. This generally occurs when you specify a Beginning Distance and/or Increment Distance resulting in features having many, many neighbors. You generally do not want to create spatial relationships where your features have thousands of neighbors. Use a smaller value for the Increment Distance and temporarily remove locational outliers so that you can start with a smaller Beginning Distance value.
Even if you let the tool calculate a Beginning Distance and Increment Distance for you, processing time can be long for large datasets. You can improve performance by:
- Temporarily removing locational outliers
- Instead of running the analysis on all features, select features in a representative portion of the study area and run the analysis on just those features.
- Take a random sample of features from the dataset and run your analysis on just those sampled features.
Distances are always based on the Output Coordinate System environment setting. The default setting for the Output Coordinate System environment is Same as Input. Input features are projected to the output coordinate system prior to analysis.
The optional Output Table will contain the distance value at each iteration, the Moran's I Index value, the expected Moran's I index value, the variance, the z-score, and the p-value. A peak would be an increase in the z-score value followed by a decrease in the z-score value. For example, if this tool finds the following series of z-scores for 50, 100, and 150 meter distances, 2.95, 3.68, 3.12, the peak would be 100 meters.
The optional Output Report File is created as a PDF file and may be accessed from the Results window by double-clicking on the file name.
This tool will optionally create a PDF report summarizing results. PDF files do not automatically appear in the Catalog window. If you want PDF files to be displayed in Catalog, open the ArcCatalog application, select the Customize menu option, click ArcCatalog Options, and select the File Types tab. Click on the New Type button and specify PDF, as shown below, for File Extension.
On machines configured with the ArcGIS language packages for Chinese or Japanese, you might notice missing text or formatting problems in the PDF Output Report File. These problems can be corrected by changing the font settings.
When no peak z-scores are identified, both the first peak z-score and maximum peak z-score derived output parameters return a blank.
When using this tool in Python scripts, the result object returned from tool execution has the following outputs:
Position
Description
Data Type
0
First Peak
Double
1
Max Peak
Double
Syntax
Parameter | Explanation | Data Type |
Input_Features |
The feature class for which spatial autocorrelation will be measured over a series of distances. | Feature Layer |
Input_Field |
The numeric field used in assessing spatial autocorrelation. | Field |
Number_of_Distance_Bands |
The number of times to increment the neighborhood size and analyze the dataset for spatial autocorrelation. The starting point and size of the increment are specified in the Beginning Distance and Distance Increment parameters, respectively. | Long |
Beginning_Distance (Optional) |
The distance at which to start the analysis of spatial autocorrelation and the distance from which to increment. The value entered for this parameter should be in the units of the Output Coordinate System environment setting. | Double |
Distance_Increment (Optional) |
The distance to increase after each iteration. The distance used in the analysis starts at the Beginning Distance and increases by the amount specified in the Distance Increment. The value entered for this parameter should be in the units of the Output Coordinate System environment setting. | Double |
Distance_Method (Optional) |
Specifies how distances are calculated from each feature to neighboring features.
| String |
Row_Standardization (Optional) |
| Boolean |
Output_Table (Optional) |
The table to be created with each distance band and associated z-score result. | Table |
Output_Report_File (Optional) |
The PDF file to be created containing a line graph summarizing results. | File |
Code Sample
The following Python window script demonstrates how to use the IncrementalSpatialAutocorrelation tool.
import arcpy, os
import arcpy.stats as SS
arcpy.env.workspace = r"C:\ISA"
SS.IncrementalSpatialAutocorrelation("911CallsCount.shp", "ICOUNT", "20", "", "", "EUCLIDEAN",
"ROW_STANDARDIZATION", "outTable.dbf", "outReport.pdf")
The following stand-alone Python script demonstrates how to use the IncrementalSpatialAutocorrelation tool.
# Hot Spot Analysis of 911 calls in a metropolitan area
# using the Incremental Spatial Autocorrelation and Hot Spot Analysis Tool
# Import system modules
import arcpy, os
import arcpy.stats as SS
# Set geoprocessor object property to overwrite existing output, by default
arcpy.gp.overwriteOutput = True
# Local variables
workspace = r"C:\ISA"
try:
# Set the current workspace (to avoid having to specify the full path to the feature classes each time)
arcpy.env.workspace = workspace
# Copy the input feature class and integrate the points to snap together at 30 feet
# Process: Copy Features and Integrate
cf = arcpy.CopyFeatures_management("911Calls.shp", "911Copied.shp","#", 0, 0, 0)
integrate = arcpy.Integrate_management("911Copied.shp #", "30 Feet")
# Use Collect Events to count the number of calls at each location
# Process: Collect Events
ce = SS.CollectEvents("911Copied.shp", "911Count.shp")
# Use Incremental Spatial Autocorrelation to get the peak distance
# Process: Incremental Spatial Autocorrelation
isa = SS.IncrementalSpatialAutocorrelation(ce, "ICOUNT", "20", "", "", "EUCLIDEAN",
"ROW_STANDARDIZATION", "outTable.dbf", "outReport.pdf")
# Hot Spot Analysis of 911 Calls
# Process: Hot Spot Analysis (Getis-Ord Gi*)
distance = isa.getOutput(2)
hs = SS.HotSpots(ce, "ICOUNT", "911HotSpots.shp", "Fixed Distance Band",
"Euclidean Distance", "None", distance, "", "")
except:
# If an error occurred when running the tool, print out the error message.
print arcpy.GetMessages()
Environments
- Output Coordinate System
Feature geometry is projected to the Output Coordinate System prior to analysis. All mathematical computations are based on the Output Coordinate System spatial reference. When the Output Coordinate System is based on degrees, minutes, and seconds, geodesic distances are estimated using chordal distances.