Preprocessing

It is sometimes advisable to perform some preprocessing of the imagery prior to creating a mosaic dataset. Preprocessing should generally not result in sampling of the imagery; which would change data values. The following is an overview of such processes.

Pyramids

If images have more than about 2000 rows and columns it is advantageous to create pyramids. Pyramids are reduced resolution version of the imagery that enable faster access at a smaller scale.

Pyramids can be internal to the files or external in the form of .OVR files or .RRD files. The creation of external overviews has the advantage that the original files are not modified and, if necessary they can be easily removed to reduce space.

The simplest method to create is to use the Build Pyramids And Statists tool. This allows a workspace to be entered and pyramids to be created for all rasters within the workspace.

Pyramids are stored in a single file that generally resides next to the source rasters. They will be given the same name as the source with a *.ovr extension. Internally, these are actually TIF files created with multiple 2:1 down-sampled resolutions. ArcGIS also supports TIF files with internal pyramids and older .RRD pyramids.

In the majority of cases pyramids can be compressed even if the original data sources contain no compressed data, since analysis is typically not performed on the pyramids of files. If the data source and pyramids are noncompressed then the pyramids will take 1/3 additional storage space. If the source is not compressed and the pyramids are compressed then the additional storage can be as small as a few percent of the original size. If the original data is compressed then even if using the same compression as the source the overviews may be typically about 40% of the source due to the higher frequency of image content typically in the overviews means that compression is less. For a table of pyramids sizes, see Raster Pyramids.

An additional advantage of external pyramid files is that at a later stage of a project if the higher resolution data is no longer necessary then it is relatively easy to archive the original source imagery, but leave the smaller overviews to still provide access to good representations of the imagery.

When creating pyramids there are environmental variables that control how they are generated. These include.

Pyramid/compression – In most cases it is suitable to compress the overviews. For natural color 3-band imagery JPEG YCBCR is recommended. For panchromatic or other imagery JPEG is recommended. For elevation or categorical data LZW is recommended. Typically the compression factor for JPEG can be set to 80. Note that even if the imagery is 16-bit (such much satellite imagery) JPEG YCBCR or JPEG (RGB) can be used since ArcGIS supports a 12-bit version of JPEG that is generally suitable for such imagery.

Sampling method – For optical imagery it is advisable to use bilinear sampling since this generally provides better quality imagery when viewed at smaller scales. However, bilinear can result in artifacts at the edges of images that include black (or white pixels) that may be used to define NoData. ArcGIS will correctly handle such NoData pixels and not create the artifacts if the NoData pixels are correctly defined in the dataset. Therefore, for imagery that does have NoData it is recommended that the NoData value is defined prior to creating the pyramids.

For categorical data the nearest (or majority) sampling method should be used. For datasets such as elevation more careful consideration needs to be taken as to what sampling method is used, but in most cases bilinear is still recommended. Note that using nearest sampling with a factor of 2 will result in an half-pixel shift at each overview level due to alignment of the image extents. If nearest neighbor sampling is required then it is generally better to set the sampling factor to 3 to avoid such shifts. Although this can affect the performance at smaller scales by about 20%.

Statistics

Statistics are used by the system primarily to ensure suitable default display of the images. If statistics exist with a raster dataset then ArcGIS will apply a stretch to the imagery to make the imagery appear brighter. If statics are not present then when displaying a single image the system will attempt to approximate statistics by reading the central part of the imagery. As a general rule statics should be created for satellite scenes and datasets such as elevation where the range of valid data values can be large. If using imagery that has been pre-processed, such as color corrected imagery, then statistics are not required since such imagery should not be additionally stretched. The creation of statistics should done at the same time (and using the same tool) as pyramids.

Creating statistics can be time consuming on large datasets as it requires the compete image to be read. An environmental variable, call skip factor, can be used to control how stats are created to reduce the reading time. To reduce the time to create statistics, the skip factor can be set so that not all the pixels are read. One way to identify a reasonable skip factor value is to divide the number of columns by 1,000 and use the quotient (integer) as the skip factor. Using such a skip factor only reduces the time taken if pyramids exists.

File format

As defined in the 'Data Format' section above it is sometimes advantageous or highly recommended to convert the format of the files.

3/25/2013