The Histogram tool provides a univariate (one-variable) description of your data. The tool dialog box displays the frequency distribution for the dataset of interest and calculates summary statistics.
The frequency distribution is a bar graph that displays how often observed values fall within certain intervals or classes. You can specify the number of classes of equal width that are used in the histogram. The relative proportion of data that falls in each class is represented by the height of each bar. For example, the histogram below shows the frequency distribution (10 classes) for a dataset.
The important features of a distribution can be summarized by statistics that describe its location, spread, and shape.
Measures of location
Measures of location provide you with an idea of where the center and other parts of the distribution lie.
- The mean is the arithmetic average of the data. The mean provides a measure of the center of the distribution.
- The median value corresponds to a cumulative proportion of 0.5. If the data was arranged in increasing order, 50 percent of the values would lie below the median, and 50 percent of the values would lie above the median. The median provides another measure of the center of the distribution.
- The first and third quartiles correspond to the cumulative proportion of 0.25 and 0.75, respectively. If the data was arranged in increasing order, 25 percent of the values would lie below the first quartile, and 25 percent of the values would lie above the third quartile. The first and third quartiles are special cases of quantiles. The quantiles are calculated as follows:
quantile = (i - 0.5) / Nwhere i is the ith ordered data value.
Measures of spread
The spread of points around the mean value is another characteristic of the displayed frequency distribution.
- The variance of the data is the average squared deviation of all values from the mean. Because it involves squared differences, the calculated variance is sensitive to unusually high or low values. The variance is estimated by summing the squared deviations from the mean and dividing the sum by (N-1).
- The standard deviation is the square root of the variance, and it describes the spread of the data about the mean. The smaller the variance and standard deviation, the tighter the cluster of measurements about the mean value.
The diagram below shows two distributions with different standard deviations. The frequency distribution represented by the black line is more variable (wider spread) than the frequency distribution represented by the red line. The variance and standard deviation for the black frequency distribution are greater than those for the red frequency distribution.
Measures of shape
The frequency distribution is also characterized by its shape.
The coefficient of skewness is a measure of the symmetry of a distribution. For symmetric distributions, the coefficient of skewness is zero. If a distribution has a long right tail of large values, it is positively skewed, and if it has a long left tail of small values, it is negatively skewed. The mean is larger than the median for positively skewed distributions and vice versa for negatively skewed distributions. The image below shows a positively skewed distribution.
Kurtosis is based on the size of the tails of a distribution and provides a measure of how likely it is that the distribution will produce outliers. The kurtosis of a normal distribution is equal to three. Distributions with relatively thick tails are termed leptokurtic and have kurtosis greater than three. Distributions with relatively thin tails are termed platykurtic and have a kurtosis less than three. In the following diagram, a normal distribution is given in red, and a leptokurtic (thick-tailed) distribution is given in black.
With the Histogram tool, you can examine the shape of the distribution by direct observation. By reviewing the mean and median statistics, you can determine the center location of the distribution. Notice that in the figure below the distribution is bell-shaped, and since the mean and median values are very close, this distribution is close to normal. You can also highlight the extreme values in the tail of the histogram and see how they are spatially located in the displayed map.
If your data is highly skewed, you can test the effects of a transformation on your data. This figure shows a skewed distribution before a transformation is applied.
A log transformation is applied to the skewed data, and in this case, the transformation makes the distribution close to normal.
For more information on the transformations available with the Histogram tool, see Box-Cox, arcsine, and log transformations.