How Cluster and Outlier Analysis (Anselin Local Moran's I) works
Given a set of features (Input Feature Class) and an analysis field (Input Field), the Cluster and Outlier Analysis tool identifies spatial clusters of features with high or low values. The tool also identifies spatial outliers. To do this, the tool calculates a local Moran's I value, a z-score, a p-value, and a code representing the cluster type for each statistically significant feature. The z-scores and p-values represent the statistical significance of the computed index values.
Interpretation
A positive value for I indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value for I indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. For more information on determining statistical significance, see What is a z-score? What is a p-value? Note that the local Moran's I index (I) is a relative measure and can only be interpreted within the context of its computed z-score or p-value. The z-scores and p-values reported in the output feature class are uncorrected for multiple testing or spatial dependency.
The cluster/outlier type (COType) field distinguishes between a statistically significant cluster of high values (HH), cluster of low values (LL), outlier in which a high value is surrounded primarily by low values (HL), and outlier in which a low value is surrounded primarily by high values (LH). Statistical significance is set at the 95 percent confidence level. When no FDR correction is applied, features with p-values smaller than 0.05 are considered statistically significant. The FDR correction reduces this p-value threshold from 0.05 to a value that better reflects the 95 percent confidence level given multiple testing.
Output
This tool creates a new output feature class with the following attributes for each feature in the input feature class: local Moran's I index, z-score, p-value, and COType.
When this tool runs in ArcMap, the output feature class is automatically added to the table of contents (TOC) with default rendering applied to the COType field. The rendering applied is defined by a layer file in <ArcGIS>/ArcToolbox/Templates/Layers. You can reapply the default rendering, if needed, by importing the template layer symbology.
Best practice guidelines
- Results are only reliable if the input feature class contains at least 30 features.
- This tool requires an input field such as a count, rate, or other numeric measurement. If you are analyzing point data, where each point represents a single event or incident, you might not have a specific numeric attribute to evaluate (a severity ranking, count, or other measurement). If you are interested in finding locations with many incidents (hot spots) and/or locations with very few incidents (cold spots), you will need to aggregate your incident data prior to analysis. The Hot Spot Analysis (Getis-Ord Gi*) tool is also effective for finding hot and cold spots. Only the Cluster and Outlier Analysis (Anselin Local Moran's I) tool, however, will identify statistically significant spatial outliers (a high value surrounded by low values or a low value surrounded by high values).
- Select an appropriate conceptualization of spatial relationships.
- When you select the SPACE_TIME_WINDOW conceptualization, you can identify space-time clusters and outliers. See Space-Time Analysis for more information.
- Select an appropriate distance band or threshold distance.
- All features should have at least one neighbor.
- No feature should have all other features as a neighbor.
- Especially if the values for the input field are skewed, each feature should have about eight neighbors.
Potential applications
The Cluster and Outlier Analysis (Anselin Local Moran's I) tool identifies concentrations of high values, concentrations of low values, and spatial outliers. It can help you answer questions such as these:
- Where are the sharpest boundaries between affluence and poverty in a study area?
- Are there locations in a study area with anomalous spending patterns?
- Where are the unexpectedly high rates of diabetes across the study area?
Applications can be found in many fields including economics, resource management, biogeography, political geography, and demographics.
Additional resources
Anselin, Luc. "Local Indicators of Spatial Association—LISA," Geographical Analysis 27(2): 93–115, 1995.
Mitchell, Andy. The ESRI Guide to GIS Analysis, Volume 2. ESRI Press, 2005.