Preprocessing

It is sometimes advisable to perform some preprocessing of the imagery prior to creating a mosaic dataset. Preprocessing should generally not result in sampling of the imagery, which would change data values. The following is an overview of such processes.

Pyramids

If images have more than about 2,000 rows and columns, it is advantageous to create pyramids. Pyramids are reduced-resolution versions of the imagery that enable faster access at a smaller scale.

Pyramids can be internal to the files or external in the form of .ovr files or .rrd files. The creation of external overviews has the advantage that the original files are not modified, and if necessary, they can be easily removed to reduce space.

The simplest method to create pyramids is to use the Build Pyramids And Statistics tool. This allows a workspace to be entered and pyramids to be created for all rasters within the workspace.

Pyramids are stored in a single file that generally resides next to the source rasters. They will be given the same name as the source, with an *.ovr extension. Internally, these are actually TIFF files created with multiple 2:1 downsampled resolutions. ArcGIS also supports TIFF files with internal pyramids and older .rrd pyramids.

In the majority of cases, pyramids can be compressed even if the original data sources contain no compressed data, since analysis is typically not performed on the pyramids of files. If the data source and pyramids are not compressed, the pyramids will take one-third additional storage space. If the source is not compressed and the pyramids are compressed, the additional storage can be as small as a few percent of the original size. If the original data is compressed, then even if using the same compression as the source, the overviews may be about 40 percent of the source due to the higher frequency of image content typically in the overviews, which means that compression is less. For a table of pyramid sizes, see Raster pyramids.

An additional advantage of external pyramid files is that at a later stage of a project, if the higher-resolution data is no longer necessary, it is relatively easy to archive the original source imagery, but leave the smaller overviews to still provide access to good representations of the imagery.

When creating pyramids, there are environmental variables that control how they are generated. These include the following:

Pyramid/Compression—In most cases it is suitable to compress the overviews. For natural color 3-band imagery, JPEG YCBCR is recommended. For panchromatic or other imagery, JPEG is recommended. For elevation or categorical data, LZW is recommended. Typically, the compression factor for JPEG can be set to 80. Note that even if the imagery is 16-bit (such as much satellite imagery), JPEG YCBCR or JPEG (RGB) can be used, since ArcGIS supports a 12-bit version of JPEG that is generally suitable for such imagery.

Sampling method—For optical imagery it is advisable to use bilinear sampling, since this generally provides better-quality imagery when viewed at smaller scales. However, bilinear sampling can result in artifacts at the edges of images that include black (or white) pixels that may be used to define NoData. ArcGIS will correctly handle such NoData pixels and will not create the artifacts if the NoData pixels are correctly defined in the dataset. Therefore, for imagery that does have NoData values, it is recommended that the NoData values are defined prior to creating the pyramids.

For categorical data, the nearest (or majority) sampling method should be used. For datasets such as elevation, more careful consideration needs to be taken as to what sampling method is used, but in most cases bilinear is still recommended. Note that using nearest sampling with a factor of 2 will result in a half-pixel shift at each overview level due to alignment of the image extents. If nearest neighbor sampling is required, it is generally better to set the sampling factor to 3 to avoid such shifts, although this can affect the performance at smaller scales by about 20 percent.

Statistics

Statistics are used by the system primarily to ensure suitable default display of the images. If statistics exist with a raster dataset, ArcGIS will apply a stretch to the imagery to make the imagery appear brighter. If statistics are not present, then when displaying a single image, the system will attempt to approximate statistics by reading the central part of the imagery. As a general rule, statistics should be created for satellite scenes and datasets such as elevation where the range of valid data values can be large. If using imagery that has been preprocessed, such as color-corrected imagery, then statistics are not required, since such imagery should not be additionally stretched. The creation of statistics should be done at the same time (and using the same tool) as pyramids.

Creating statistics can be time-consuming on large datasets, as it requires the complete image to be read. An environmental variable called skip factor can be used to control how statistics are created to reduce the reading time. To reduce the time to create statistics, the skip factor can be set so that not all the pixels are read. One way to identify a reasonable skip factor value is to divide the number of columns by 1,000 and use the quotient (integer) as the skip factor. Using such a skip factor only reduces the time taken if pyramids exist.

File format

As defined in the Data format section above, it is sometimes advantageous or highly recommended to convert the format of the files.

10/28/2013