Getting started with data validation and analysis
ArcGIS Data Reviewer is an ArcGIS extension that contains several tools for data validation and analysis. Checks allow you to perform various types of analysis on the data in a geodatabase, which includes evaluating feature extents and spatial relationships between features. They can be grouped together using batch jobs so you can check for multiple conditions all at once.
However, before you start validation and analysis tasks, there is some setup work involved. Below are some tips and guidance for setting up Data Reviewer for your quality assurance/quality control process.
Before data validation begins
Checks and batch jobs are the tools used to perform data validation with Data Reviewer. However, before you begin to configure checks and create batch jobs, it is recommended that you understand what types of conditions you want to find in the data. Examples of these conditions include the following:
- Is there a data specification the data must meet? For example, should built-up areas be a certain size to determine whether they are digitized as points or polygons?
- Are there spatial problems? For example, do you need to ensure that buildings have not been digitized on top of lakes or reservoirs?
These types of conditions can also be referred to as business rules for your data. Business rules can come from industry standards or product specifications, subject matter experts, or standard operating procedures. Checks can be configured to validate any of these conditions. You may also require multiple checks to search for all the rules you want to validate within a single feature class or table.
Learn more about the checks available with Data Reviewer
If you are working with a large extent or plan to do visual quality control, you can create a polygon grid to divide the data into smaller, more manageable sections. These smaller sections can be used to systematically review the large extent and track the visual review process.
Check and batch job configuration
Most of the Data Reviewer checks run on tables or point, line, or polygon feature classes, but there are some that have special requirements. Below are the types of checks and individual checks that have specific requirements.
- The Connectivity check requires a geometric network with connectivity rules.
- The topology checks require that the geodatabase contain a topology.
- The Relationship check requires a relationship class.
- The z-value checks require that the feature classes being validated have a z coordinate system defined.
- The Metadata check requires that metadata is defined for the workspace to be validated.
Organizing checks in batch jobs
When you are configuring checks to add to batch jobs, you can choose to organize them in several different ways. Generally, you can have one or more groups with checks within them. Several scenarios are presented below.
- You can create a large batch job that validates several conditions across a database. This batch job can be divided into groups that reflect the different conditions you want to validate. For example, you could have a batch job that contains both attribute and spatial checks.
- Checks can be organized based on a feature class so you can check several conditions at once within the same batch job. For example, you can set up several checks to run on the same line feature class and check for multiple conditions at once.
- If you are working with a geodatabase that contains a large schema, you can create smaller batch jobs that focus on a single check. This type of batch job can be created by duplicating checks. This method of organizing checks allows you to run specific checks on all the feature classes in your schema. You can also create batch jobs for each geometry type so you can have even more control over which feature classes are validated. For example, to see if any of the point feature classes contain duplicate features, you can run a batch job that contains Duplicate Geometry checks for all the point feature classes in the schema.
Organizing Reviewer sessions
Depending on your organization, it may be useful to have several Reviewer sessions set up for data validation and analysis. At a minimum, you can choose to have separate sessions for automated and visual quality control. This allows you to separate manual quality control results from batch job results, which may be necessary for reporting data quality. Typically, automated validation must pass 100 percent, while manual visual quality control does not. By separating the results into different sessions, you can use the reports available in Data Reviewer to determine the level of quality on your data.