Using validation to assess models

Validation allows you to evaluate your predictions using a dataset that was not involved in creating the prediction model.

As with cross-validation, your goals should be to have the following:

  1. Create the subsets that you will use to create and validate the model:
    1. Add the dataset to ArcMap that you want to subset.
    2. Click the drop-down arrow on the Geostatistical Analyst toolbar and click Subset Features tool.

      The Subset Features tool opens. You can also open this tool directly from the Geostatistical Analyst Tools toolbox. It is located in the Working with Geostatistical Layers toolset.

    3. Specify the dataset that you want to subset.
    4. Specify names and locations for the training and test subsets that will be created.
    5. Specify the size of the training subset. The default is 50 percent of the data, but you can specify a different percentage or a specific number of features by switching Subset size units to ABSOLUTE_VALUE.
    6. Click Finish.
  2. Create an interpolation model:
    1. Click the Geostatistical Wizard button Geostatistical Wizard on the Geostatistical Analyst toolbar.
    2. Use the training dataset to create your interpolation model.
  3. Run validation using the test dataset:
    1. Once the output surface has been created, right-click the layer and choose Validation/Prediction.

      This opens the Layer To Points dialog box.

    2. The Input geostatistical layer box should already be populated with the layer you chose to validate.
    3. Under Point observation locations, specify the Test dataset.
    4. Field to validate on must be the same field (attribute) you used to create the interpolation model.
    5. Specify a location for the file in the Output statistics at point locations box.
    6. Click OK.
  4. Evaluate the results:
    1. Open the attribute table of the point feature layer created in the previous step by right-clicking the layer and choosing Open Attribute Table.
    2. Scroll to the right until you see the Included, Predicted, and Error fields. There may be others, such as Standard Error, Standardized Error, and Normal Value, if you are validating a kriging model.
    3. Right-click the Error column heading and click Statistics.

      The mean value should be close to 0.

    4. To calculate the root mean square error, add a field (defined as double) called Error_squared to the attribute table. Right-click this column heading and use Field Calculator to calculate its values. Set the expression to Error * Error. Use the Statistics tool to obtain the mean of these squared errors. Finally, take the square root of the mean value. This is the root mean square error, and its value should be as low (close to 0) as possible.
    5. Right-click the Standard Error column heading and click Statistics.

      The mean value should be as low (close to 0) as possible. It should also be similar to the root mean square error.

    6. If you are validating a kriging model, right-click the Standardized Error column heading and click Statistics. The mean value should be close to 0.
    7. It may also be useful to plot the predicted versus the measured (the original attribute) values in a scatterplot to see if the points fall close to a 45-degree line.

Related Topics