Raster data statistics
Statistics are required for a raster dataset or mosaic dataset to perform some geoprocessing operations or certain tasks in ArcGIS for Desktop applications (for example, ArcMap or ArcCatalog), such as applying a contrast stretch or classifying data.
For raster datasets, the statistical information, including a histogram, is stored in an associated auxiliary file, if it cannot be stored internally. Once the auxiliary file has been created, the statistics within it will be reused for future procedures that require statistical information. Statistical information for mosaic datasets is stored internally.
It is not always essential to calculate statistics because they will be calculated automatically when they are needed. For example, in ArcMap, when a raster dataset without statistics is first added to the data frame, default statistics are calculated from a subset of the dataset when they are needed to render the raster dataset. These statistics are temporary and are not stored with the raster dataset. If statistics do not exist for a mosaic dataset, they are not calculated automatically when displayed in ArcMap. If statistics exist for a mosaic dataset, more stretch methods are available (such as Percent Clip and Histogram Equalization).
You can modify the stretch parameters on the Layer Properties dialog box to use the statistics from the current display extent, or you can generate the statistics for the dataset. You can also modify the stretch parameters in the Display section on the Image Analysis window. Creating statistics for rasters prior to their use in ArcMap is recommended so that you don't have to wait for the statistics to be calculated when displaying the raster dataset.
If the statistics do not exist, they can be created in ArcCatalog or the Catalog window or by using the Calculate Statistics tool. There are two sets of parameters you can specify when calculating statistics: a skip factor and values to ignore. Setting a skip factor speeds up the process of calculating statistics by skipping pixels. The default is a skip factor of 1 for both row and column, which means that every cell in the raster is used in the calculation, resulting in the most accurate statistics. It is recommended that you use a skip factor (such as 100) when you are calculating statistics on a large raster stored in ArcSDE or a large mosaic dataset. This will save you time because every cell is not examined. A skip factor is not used with all raster formats. The raster formats that calculate statistics and take advantage of the skip factor include TIFF, IMG, NITF, DTED, RAW, ADRG, CIB, CADRG/ECRG, DIGEST, GIS, LAN, CIT, COT, ERMapper, ENVI DAT, BIL, BIP, BSQ, and geodatabase. You can also specify one or more ignore values, which are the cell values you do not want used when calculating statistics such as background values (for example, the edges of some satellite data) or NoData values.
You can also use the Set Raster Properties tool to define the statistics for a raster dataset or mosaic dataset if you do not want to have the application calculate them. You can either enter the minimum, maximum, standard deviation, and mean values for each band or extract the values from an XML file containing the statistics. This file can be created by exporting the statistics from another raster or mosaic dataset. This tool does not import the histogram that is also stored with the statistics in aux.xml.
The statistics for a raster dataset or mosaic dataset can be viewed on the dataset's Properties dialog box. Below is an example showing the statistics for a thematic raster dataset, such as a land-use dataset. Statistics are calculated for each band; if there is more than one band in the raster dataset, the statistics for each band are present. You can see that the parameters used to build the statistics are listed. The statistics that are calculated include the minimum and maximum pixel values as well as the mean and standard deviation of the calculated pixel values, and if the dataset is thematic, the number of classes is listed. If your dataset is continuous, there are no classes.
You cannot recalculate statistics on a grid dataset, because they are stored within that file format and are always present. The statistics are calculated using every cell in the grid except for cells with the value of NoData.
Mosaic dataset statistics
Statistics (and the histogram) are used to enable automated stretching of imagery and are important for some types of analysis. They can exist at three locations within a mosaic dataset:
- The mosaic dataset
- With each source raster dataset
- On each raster item in the mosaic dataset after the functions have been applied
Mosaic dataset statistics
These statistics are applied to the entire mosaic dataset when it is displayed.
When you calculate statistics for a mosaic dataset, the base pixels are examined; that is, the source raster datasets with the lowest pixel sizes are examined and the statistics are generated across the entire mosaic. This is why it is recommended that you use a skip factor. One way to identify a reasonable skip factor value is to divide the number of columns by 1,000 and use the quotient (integer) as the skip factor. However, if your mosaic dataset has overviews, the statistics are generated using the overviews. When building overviews, statistics are generated automatically.
To calculate statistics on the mosaic dataset, right-click the mosaic dataset in the Catalog window and click Calculate Statistics; the Calculate Statistics tool opens. You can also open this tool directly.
Source raster dataset statistics
These are the statistics of the source raster datasets within the mosaic dataset. They are necessary if you plan to color balance the raster dataset.
Statistics are not automatically generated for each raster dataset in the mosaic dataset; however, when adding the raster data to a mosaic dataset, you can check Calculate Statistics to calculate the statistics for each source raster dataset if they don't already exist. Or you can use the Build Pyramids And Statistics tool, add the mosaic dataset as the input, then check the Calculate Statistics and Include Source Datasets options.
Raster item statistics
Each row in the mosaic dataset's attribute table represents a raster item in the mosaic dataset. There is not always a one-to-one relationship with the raster datasets and raster items in the mosaic dataset; therefore, they are considered separately. For example, a raster item may represent a pan-sharpened image that is created from two datasets. Each raster item can have its own function chain, which may cause the statistics to be altered significantly (thereby affecting the rendering); for example, the NDVI function, Arithmetic function, or Stretch function can alter the pixel values and change the statistics. Like with the source raster datasets, the statistics are not automatically generated for each raster item in the mosaic dataset.
To calculate statistics on the raster items in the mosaic dataset
- Use the Build Pyramids And Statistics tool, check the Calculate Statistics option, then uncheck the Include Source Datasets option.
- Use the Synchronize Mosaic Dataset tool and check the Calculate Statistics option to calculate the statistics for each raster item. This tool honors selections, so statistics can be computed for a subset of the complete mosaic dataset.
Statistics function and Stretch function
The Statistics function calculates focal statistics for each pixel based on a defined focal neighborhood, not the histogram and statistics this topic is discussing.
The Stretch function can be used to enhance an image by changing properties, such as brightness, contrast, and gamma, through multiple stretch types. By default, the statistics used by this function are retrieved from the data; however, you can enter your own statistics in the function's dialog box. If you do not specify your own statistics, you must make sure statistics have been calculated. And where this function is added determines the tool you use to calculate the statistics (as discussed above).
- If the Stretch function is added on the mosaic dataset, the statistics for the mosaic dataset need to be calculated.
- If the Stretch function is added as the first function in the raster's function chain, or is the first function in the chain to affect the pixel values, the raster dataset's statistics need to be calculated.
- If the Stretch function is added after functions that can affect the pixel values, the raster item's statistics need to be calculated.
Color balancing attempts to remove trends across images to make them look more seamless. Statistics must exist for the rasters within a mosaic dataset when using color balancing. If you attempt to use the Color Balance Mosaic Dataset tool or the Mosaic Color Correction window to color balance a mosaic dataset containing raster datasets that do not have statistics, they fail to complete and will report a message that the statistics are missing.
Display properties (turning off a default stretch)
By default, when statistics exist, the application (for example, ArcMap) applies a stretch to enhance the imagery. If you have prestretched (enhanced) imagery in your mosaic dataset or you've used the Stretch function, you might not want the application to apply a default stretch. In this case, you can modify a property to turn off this default: open the mosaic dataset's properties, click the General tab, then set the Source Type property value to Processed.