What is geoprocessing?
Geoprocessing is for everyone that uses ArcGIS. Whether you're a new or advanced user, geoprocessing is likely an essential part of your day-to-day work with ArcGIS. The fundamental purpose of geoprocessing is to provide tools and a framework for performing analysis and managing your geographic data. The modeling and analysis capabilities geoprocessing provides make ArcGIS a complete geographic information system.
Geoprocessing provides a large suite of tools for performing GIS tasks that range from simple buffers and polygon overlays to complex regression analysis and image classification. The kinds of tasks to be automated can be mundane—for example, to wrangle herds of data from one format to another. Or the tasks can be quite creative, using a sequence of operations to model and analyze complex spatial relationships—for example, calculating optimum paths through a transportation network, predicting the path of wildfire, analyzing and finding patterns in crime locations, predicting which areas are prone to landslides, or predicting flooding effects of a storm event.
Geoprocessing is based on a framework of data transformation. A typical geoprocessing tool performs an operation on an ArcGIS dataset (such as a feature class, raster, or table) and produces a new dataset as the result of the tool. Each geoprocessing tool performs a small yet essential operation on geographic data.
Geoprocessing allows you to chain together sequences of tools, feeding the output of one tool into another, as illustrated in the examples below. You can use this ability to compose an infinite number of geoprocessing models (tool sequences) that help you automate your work and solve complex problems. You can share your work with others by packaging your workflow into an easily shared geoprocessing package. You can also create web services from your geoprocessing workflows.
Automating data management tasks: Project and Clip
Suppose you've received 20 shapefiles from a colleague, and they're in different map projections and contain lots of features that are outside your study area. Your task is to change the map projection of each of the 20 datasets, remove the extraneous features ("clip" the datasets), and put them all into a file geodatabase.
By far the easiest way to accomplish this task is to use geoprocessing. First, you would use the geoprocessing Project tool, which applies a new projection to an input feature class to create a new output feature class. The illustration below shows the Project tool dialog box with its input features shown in the upper left and the projected features in the upper right. The projected coordinate system is Albers equal area conic.
The second step is to use the geoprocessing Clip tool to clip the data that falls outside your study area. The Clip tool takes two inputs, a feature class of any type (point, polyline, polygon) and a polygon feature class (the clip feature class), and creates a new feature class of just those features that fall inside the clip polygons.
Both the Project and Clip tool can be used in batch mode, which lets you input the list of your 20 feature classes, and the tools automatically execute once for each of the 20 feature classes. You can create the list by dragging the feature classes from the Catalog window onto the tool dialog box.
Or better yet, you can quickly create a geoprocessing model that chains together the Project and Clip tools, feeding the output of Project into the input of Clip, and use the model in batch mode. The model you create becomes a new tool in your geoprocessing environment.
Modeling and Analysis: Finding suitable sites for parks
Spatial analysis is one of the more interesting and remarkable aspects of GIS. Using spatial analysis, you can combine information from many independent sources and derive a new set of information (results) by applying a large, rich, and sophisticated set of spatial operators. These spatial operations are all part of the suite of geoprocessing tools.
For example, here is a bit more complex use of geoprocessing that performs a simplified site selection for parks and produces a dataset of potential park sites that can be further evaluated. The site selection logic is to find areas that are close to densely populated areas but are not close to any existing parks, the logic being that you want parks close to people, but you don't want parks clustered tightly together. Furthermore, it is deemed more important to have parks close to populated areas than it is to place a new park farther away from existing parks. As noted, this is very simple logic and only serves to identify potential sites for further evaluation (such as compatible land use, site availability, and site qualities).
In the illustration below, the Potential park sites map shows more suitable locations in dark purple, while less suitable areas are shown in lighter shades. Gray areas mark the locations of existing parks. The illustration also shows that population density is a more influential factor, that is, has a higher influence (60), in site selection than distance to parks (40). (These weights are entirely arbitrary.)
The following geoprocessing model illustrates the preceding logic. There are five steps in this model, each labeled with a blue circle.
- Step 1 calculates the population density from an input point feature class containing population centroids and outputs a raster dataset containing population density for every cell.
- Step 2 calculates the distance to parks from a raster of existing parks and outputs a raster dataset with distance to existing parks as the value for each cell.
- Step 3 reclassifies the Population Density raster, and step 4 reclassifies the Distance to Parks raster. Both reclassification processes transform the raw cell values into values between 0 and 100. The reclassified values score usefulness, where 0 is least useful and 100 is most useful. For example, a cell that is close to an existing park scores lower than a cell farther away, and a cell that has high population density scores higher than a cell with low population density.
- Step 5 takes the output data from the two reclassifications and inputs the data to the Weighted Overlay tool, where the weights (60 and 40) are applied. The output raster. Potential Park Sites, contains a suitability score, as shown above. The most suitable areas have a higher value in the output cell and are shown in dark purple.
This weighted overlay approach to screening for potential sites has been around since before the advent of computers and GIS. Geoprocessing makes weighted overlay easy and accessible. For example, you could change the weights from 60 and 40 to something else and execute the model again to help determine the sensitivity to the weights. Likewise, you could change the reclassification values.
Sharing your workflows
Geoprocessing models you create, and the data that they use, can be shared using geoprocessing packages. The package you create can be e-mailed to your colleagues or uploaded to arcgis.com where they can reach a broad audience. You can also create and publish web services from your models to be consumed by web-based clients such as ArcGIS for Desktop, ArcGIS Explorer, and custom web applications.
Developing your own tools
You can create your own tools using ModelBuilder or Python. Tools you create are called custom tools and become an integral part of geoprocessing, just like system tools (those installed with ArcGIS Desktop). You can open and run your tools from the Search, Catalog, or ArcToolbox window, use them in ModelBuilder and the Python window, call them from another script, or add them as toolbar buttons.