Performance tips for geoprocessing services
Clients want and expect fast service, so your geoprocessing services need to be fast and efficient. Since ArcGIS for Server can accommodate multiple clients at once, inefficient services can overload your server. The more efficient your services, the more clients can be served with the same computing resources.
Below are tips and techniques to increase the performance of your services. In general, techniques are presented in order—those that offer larger performance boosts are presented first. The last few tips can shave a few tenths of a second off your execution time, which may be needed for some tasks.
Use layers for project data
When you execute your geoprocessing tool to create a result to publish, you should run the tool using layers as input rather than paths to datasets on disk. A layer references a dataset on disk, and layers cache properties about the dataset. This is particularly true for network dataset layers and raster layers. By using a layer instead of the path to the dataset, there is a performance advantage because when the service is started, it creates the layer from the dataset, caches basic properties of the dataset, and keeps the dataset open. When the service executes, the properties of the dataset are immediately available and the dataset is open and available to be acted upon—a performance boost.
For example, the Viewshed service on the Esri SampleServer and the ArcGIS Network Analyst extension, examples that create drive time polygons, all make use of layers. Depending on the size of the dataset, this can save upwards of 1 to 2 seconds per service execution.
Use local data to ArcGIS Server
The project data required by your geoprocessing services should be local to ArcGIS Server. Data that is shared and accessed over a network share (UNC) is slower than if it was available on the same machine. Performance numbers vary widely, but it's not uncommon that reading and writing data across an LAN takes twice as long as local disk.
Write intermediate data to memory
You can write intermediate (scratch) data to memory. Writing data to memory is faster than writing data to disk.
You can also write output data to memory as long as you are not using a result map service.
Preprocess data used by your tasks
Most geoprocessing services are intended to be focused applications providing answers to specific spatial queries posed by web clients. Since tasks tend to be specific operations on known data, there is almost always an opportunity to preprocess data to optimize the operation. For example, adding an attribute or spatial index is a simple preprocess to optimize spatial or attribute selection operations. Other examples:
- The Geoprocessing service example: Watershed tutorial preprocesses hydrologic data by creating a flow accumulation and direction raster.
- You can precompute distances from known locations using the Near or Generate Near Table tools. For example, suppose your service allows clients to select vacant parcels that are a user-defined distance from the Los Angeles River. You could use the Select Layer By Location tool to perform this selection, but it would be much faster to precompute the distance of every parcel from the Los Angeles River (using the Near tool) and store the computed distance as an attribute of the parcels. You would index this attribute using the Add Attribute Index tool. Now when the client issues a query, your task can perform a simple and fast attribute selection on the distance attribute rather than a less efficient spatial query.
Add attribute indexes
If your task is selecting data using attribute queries, create an attribute index for each attribute used in queries. You can use the Add Attribute Index tool. You only need to create the index once, and you do so outside of your model or script.
Add spatial indexes
If your model or script does spatial queries on shapefiles, create a spatial index for the shapefile using the Add Spatial Index tool. If you are using geodatabase feature classes, spatial indexes are automatically created and maintained for you. In some circumstances, recalculating a spatial index may improve performance, as described in Setting spatial indexes.
Use synchronous rather than asynchronous
You can specify that your geoprocessing service is to run in synchronous or asynchronous. In asynchronous mode, there is a slight bit of overhead incurred by the server which means that asynchronous tasks rarely execute under a second. Executing the same task in synchronous mode is a tenth of a second or so faster than executing it in asynchronous mode.
Avoid unneeded coordinate transformations
If your task is using datasets that are in different coordinate systems, the geoprocessing tools used by your task may need to transform coordinates into a single common coordinate system during execution. Depending on the size of your datasets, transforming coordinates from one coordinate system to another can slow down your task. You need to be aware of the coordinate system of your datasets and whether your tools used by your task need to perform coordinate transformations. You may want to transform all datasets used by your task into one coordinate system. Refer to the topics below for more information on coordinate systems and how they affect geoprocessing tools.
Reduce data size
Any software that processes data works faster when the dataset is small. There are a couple of ways you can reduce the size of your geographic data:
- Remove unnecessary attributes on your project data with the Delete Field tool.
- Line and polygon features have vertices that define their shape. Each vertex is an x,y coordinate. It may be that your features have more vertices than they need, unnecessarily increasing the size of your dataset.
- If your data comes from an external source, it may contain duplicate vertices or vertices that are so close together that they do not contribute to the definition of the feature.
- The number of vertices does not fit the scale of analysis. For example, your features contain details that are appropriate at large scales, but your analysis or presentation is at a small scale.
Differences between 10.0 and 10.1
If you authored geoprocessing services in 10.0, there were specific performance techniques you used to author services, noted below. You no longer need to use these techniques in 10.1.
Prior to 10.1 if your ArcGIS Server configuration was made up of multiple machines, or you used UNC paths to your arcgisjobs directory, it was recommended that you set up a local job directory. This local job directory significantly improved execution time as the processing of each task is done on the local server and the final result is transferred to the client. At 10.1, setting up local job directories is a GIS server administrator task. But as a task author, you no longer have to specify that your task use the local job directory as it is automatically used if the server participates in a cluster of more than one machine, or the directories are referenced using a UNC path.
Prior to 10.1 if your geoprocessing service operated against rasters and it was recommended to have them in the GRID format. The GRID format was generally faster as some tools were optimized to work against GRIDS. At 10.1 all raster tools can now read and write the source format without loss in performance.