Using geoprocessing to develop applications

Summary

This topic discusses the facts, misconceptions, and misstatements about geoprocessing.

About using geoprocessing to develop applications

Geoprocessing is a term that merges "geographic" with "data processing." At its most basic level, geoprocessing is how you compute with geographic data, connecting geographic data to useful functionality. There are several different ways to view geoprocessing, and the usual progression of knowledge is as follows:

Geoprocessing is a suite of tools for ArcGIS for Desktop users. ArcGIS for Desktop users open tool dialog boxes, supply tool parameters, and execute the tool by clicking the OK button on the user interface (UI). Using tool dialog boxes is the most common interaction users have with geoprocessing.
Geoprocessing can be accessed in Python script code via the ArcPy site package. Every tool has a scripting syntax that, with few exceptions, has the same signature as the tool's dialog box. Python users can run tools using the scripting syntax on the ArcGIS for Desktop Python window or in a Python script.
Geoprocessing is a framework for creating and deploying application software—on a desktop or server—using ModelBuilder (visual programming language) or Python (text-based programming language).
Geoprocessing is a suite of functions that can be accessed by any language that supports the creation of the geoprocessor object. The geoprocessor object has methods to call any tool using the scripting syntax, as well as useful methods for interrogating datasets and interacting with ArcGIS applications.
Geoprocessing is an easy-to-use coarse-grained library of functionality, as opposed to a fine-grained library like most of ArcObjects. For example, to add a field to a table takes one line of code using geoprocessing (after a one-time creation of the geoprocessor object), while from 20 to 30 lines of code are needed for a simplistic (that is, not robust) implementation using ArcObjects. To be precise, geoprocessing is a part of ArcObjects, but its function signatures are at a higher level (coarse-grained). For details, see Geoprocessing and ArcObjects in Geoprocessing overview.

Geoprocessing facts

The following are some factual statements about geoprocessing:

Fact—You will significantly reduce development costs by using geoprocessing. This is due to the coarse-grained nature of geoprocessing and the fact that geoprocessing tools have been extensively tested daily by thousands of users.
Fact—Most geographic information system (GIS) analysts are familiar with geoprocessing tools, and these analysts are a valuable development resource for you. In your code, you can call model and script tools that they develop in the same way you call any of the system tools Esri supplies. For example, you or your geoprocessing analyst can create a model's or Python script tool's contents, and you can call the model or script tool from your .NET code. Finally, you can utilize the analyst's knowledge to find the right combination of tools to use.

Geoprocessing misconceptions

The following are some misconceptions about geoprocessing:

Misconception—Geoprocessing is not suitable for intensely interactive applications where the user is interacting with map elements.

While this is generally true, geoprocessing does give you the ability to input interactively-created features into tools. You can easily build applications that capture all types of interactive input from the user and pass these inputs to geoprocessing tools.

Misconception—Geoprocessing is slow due to the overhead it incurs for validation and messaging, which makes it unsuitable for application development.

While it is true that there is overhead, it is probably less than you realize. However, the real issue is the trade-off between application development time and execution speed. Is it worth a few seconds of execution speed to build, test, and deploy your application in a quarter of the time? Definitely. Even if execution speed is critical, you can still develop and test using geoprocessing, find the delays, and improve critical sections of code.

Geoprocessing misstatements

The following are some misstatements about geoprocessing:

Misstatement—Geoprocessing tools only take datasets on disk as input and only write datasets to disks. Another way this has been stated is that geoprocessing is "pathname to pathname" only. Only pathnames to datasets can be used as input and output parameters.

In fact, you can use equivalent ArcObjects anywhere feature classes are expected. For example, you can do the following:

Pass an object with IFeatureClass or IDataset as input to a tool instead of a pathname string.
Create in-memory feature classes, manipulate them, and use them in geoprocessing tools.
Use the special in-memory FeatureSet and RecordSet objects instead of feature classes and tables. These two objects behave like their on-disk counterparts.

Misstatement—Geoprocessing is not for processing individual features.

This is a corollary to the preceding misstatement. For example, suppose you have a single point geometry and you need to select nearby polygon features. You can insert this single point geometry into an empty IFeatureClass and use it as input to the Select Layer By Location tool, with a layer of the polygon features created by the Make Feature Layer tool. The output will be a new selection set on the layer that you can persist as a feature class (in-memory or on disk) using the Copy Features tool.

Misstatement—Geoprocessing is just for data conversion.

This is not true. Unfortunately, the foundation for this misstatement comes from the word "geoprocessing." It sounds like uninteresting (but important) data conversion for GIS datasets. The other source of this misstatement is that most users' first exposure to geoprocessing comes when they need to convert data, since most data conversion tasks are found only in geoprocessing tools and not on the main ArcGIS for Desktop UI. The fact is, you can do amazing things with geoprocessing. For example, calculating optimum paths through a transportation network, predicting the path of wildfire, analyzing and finding patterns in crime locations, predicting which areas are prone to landslides, or mitigating the flooding effects of a storm event, just to mention a few things users do everyday.

Misstatement—Geoprocessing consists only of tools.

In fact, the geoprocessor object has a lot of other useful methods to help you write applications. For example, existence checking, listing datasets in a workspace, describing properties of datasets, and effortlessly posting and handling errors.

Misstatement—Geoprocessing is only supported for ArcGIS for Desktop applications. You must have ArcGIS for Desktop installed to use geoprocessing.

In fact, geoprocessing is low-level functionality in ArcGIS Engine.

For more information about geoprocessing, see Using geoprocessing.

Developing geoprocessing tools

So far, the discussion has been about how you can use geoprocessing tools and methods in your applications. However, you might be tasked with developing a geoprocessing tool that is in a toolbox and can be accessed like any other tool, that is, via its dialog box, as processes in ModelBuilder, as functions in scripting, or accessed as custom tools in ArcObjects code.

Geoprocessing tools are implemented using the IGPFunction2 interface. However, you are taking risks if you rush and start implementing with the IGPFunction2 interface. It is not that implementing IGPFunction2 is particularly difficult (the mechanics are straightforward). The real issue is to understand the geoprocessing framework and how tools behave in interactive mode (on the tool's dialog box), in scripting mode (in Python), and especially in ModelBuilder mode, where tools fully describe their output before they are executed to allow for chaining.

Avoiding risks

Do the following to avoid risks when developing tools:

Prototype your tool by creating geoprocessing script tools with Python. Your prototype tools do not have to perform actual work. They just need to implement the correct validation and describe their output for chaining in ModelBuilder.
Use your prototyped tools in a variety of scenarios in ModelBuilder and in scripts. That is, use your prototype tools like a user would.

Once you have prototyped and tested your script tools in real-world scenarios, you are ready to answer the following question—why not fully implement your tool in Python?

Python is easy to learn, and if you think you do not have time to learn another language, consider that Python is the preferred language for ArcGIS and you will benefit from learning it. For example, you can debug Python script tools directly from ArcGIS.

Python script tool misstatements

The following are some misstatements about Python script tools:

Misstatement—Python is slow.

Actually, Python performs quite well compared to compiled languages (you can search the Web for benchmarks of Python compared to other languages). Some of these misstatements come from how ArcGIS implemented Python. In ArcGIS 9.2, Python code executed out-of-process, and Python was slow when invoked from ArcGIS. However, since ArcGIS 9.3, Python runs in-process.

Misstatement—You cannot protect your intellectual property with Python scripts.

Starting at ArcGIS 10, you can embed your Python script in the toolbox and password-encrypt your code (your intellectual property is safe).

Caveat—Only the script tool code is protected. If you have additional Python libraries, they will not be encrypted.

Python script tool facts

The following are some factual statements about Python script tools:

Fact—Cursors are slow in Python compared with a cursor in a compiled language. If you need to quickly cycle through each record in a feature class or table, you will be better off using the core ArcObjects ICursor or IFeatureCursor interfaces. The big caveat here has to do with programming approaches. Geoprocessing works extremely well with sets of data. For example, if you want to do a point-in-polygon analysis, where you have one point and many polygons, your first approach might be to cursor through each polygon and do a point-in-polygon test (using IGeometry methods). While this certainly works, you would be better off using the Identity tool, and let it do its job of intelligently and quickly identifying point and polygon intersection. If all you need to do is update attributes, use the Calculate Field tool.
Fact—Python and geoprocessing do not give you access to low-level IGeometry methods. For more information, see Using ArcObjects as tool input. However, there are a lot of tools that manipulate geometries. For example, review some of the tools in the Features and Fields toolsets in the Data Management toolbox, and the tools in the Editing toolbox.
Fact—You cannot do detailed cartographic rendering with Python like you can, for example, with IRendererClasses. However, you might find tools in the Cartography toolbox that you can concise your workflow with coarse-grain geoprocessing ArcObjects.

Python, as a scripting language, started with the design goal of making it easy to join applications together. You can call executables (.exes) and libraries (in the form of .dlls) directly from Python. For more information, see How to create a script tool that runs an EXE. This means that if you have existing executables or libraries that implement algorithms you need, you can call them directly from Python, passing arguments you received and validated in your script tool. These libraries and .dlls can even use ArcObjects.

Finally, in situations where you must create your tool using the low-level fine-grained ArcObjects, see Custom geoprocessing function tools.