Working with geodatabase replicas
This topic applies to ArcGIS for Desktop Standard and ArcGIS for Desktop Advanced only.
Various workflows require you to distribute your data to different geodatabases, and synchronize changes made to the data in each geodatabase. The following is a guide to help you determine how best to use distributed data, geodatabase replicas, and synchronization for your system.
To start, review the understanding distributed data topic, which describes geodatabase replication as well as other methods for distributing data. The scenarios topic also lists a number of common use cases for which geodatabase replication can be used. If geodatabase replication seems the most appropriate method for your system, your next step is to start creating replicas.
Creating replicas
The following will help you determine the best way to create replicas for your system.
- Determine what replicas are needed: In some cases, you may need to create only one or two replicas, while in others, many replicas are needed. For example, many replicas are needed if you are distributing data to field crews to work with on their mobile devices in on-site. In cases where you want to keep two enterprise geodatabases synchronized, you may need only one replica. To understand what a replica is and how it works within a geodatabase, read the replicas and geodatabases topic.
- Decide on the type of replication: The replication types topic describes each of the three replication types available. Your system may require you to use one type of replica in one case and another type in another case. For example, you may want to use two-way replication to synchronize with another office and one-way replication to update your map publishing geodatabase.
- Choose which set of tools to use to create the replicas: ArcGIS provides several environments in which to work with geodatabase replication. Each environment offers different advantages. The following describes what each environment has to offer.
- The Create Replica wizard: The Create Replica wizard is available on the Distributed Geodatabase toolbar in ArcMap. The wizard has many options and a well-described user interface that is tightly integrated with ArcMap. It is recommended that you use the Create Replica wizard when first experimenting with creating replicas or if you plan to only create a small number of replicas.
- The Create Replica geoprocessing tool: The Create Replica geoprocessing tool can also be used to create replicas. The tool has many options but does not offer some of the more advanced options from the Create Replica wizard.
The Create Replica geoprocessing tool is ideal in cases where you need to create replicas on a regular basis. Models and scripts that can be run repeatedly are easy to build in the geoprocessing environment. For example, a model can be built to create checkout replicas on a daily basis for each of your field crews. See the Create Replica geoprocessing tool help for more information.
- ArcObjects API: An ArcObjects API is also available to support writing code to create replicas in any of several languages. This is useful when you want to customize the create replica experience or need to create replicas with complex options on a regular basis.
- Integrate replication into your versioning workflows: Geodatabase replication is built on top of versioning. At replica creation time, a replica version is defined in both the parent and child replica. This is the version from which you will send and receive changes during synchronization. See the replica creation and versioning topic for more information.
Since the replica version is the conduit through which changes are synchronized, plan how you will work with the replica versions before creating replicas. For example, you may plan to run some validation on the changes received during synchronization before integrating it into your main workflow. This can be done by analyzing the contents of the replica version after a synchronization then reconciling and posting it into your regular working version. Also, the default version can be used as the replica version. This is helpful in cases where you want the changes to go directly to default when synchronizing.
- Define the data to replicate: Geodatabase replication allows you to replicate some or all of the datasets in your ArcSDE geodatabase. It also allows you to define what features or rows to replicate using filters and relationship classes. During creation, filters are always applied first, then relationship classes are used to append additional features and rows. See preparing for replication for more information.
Consider your future needs when defining the data to replicate. For example, two-way and one-way replicas are created once and synchronized many times. The filters you define at replica creation time are also applied at synchronization time. Over time, your needs may change to require a larger replica area. It is also important to consider the type of data that you are replicating. To maintain data integrity, additional rules are applied when replicating complex data types such as geometric networks and topologies. The following help topics describe these rules and show examples: Geometric Networks, Topologies, Relationship Classes, Raster Data, and Terrains and Network Datasets.
- Consider replica creation options: Some options have been added to make the replica creation process as efficient as possible. These options are designed to work for specific cases and may or may not be applicable to your workflow. Review the following list to see if you can take advantage of these options:
- Re-use schema: With Re-use schema, you specify a target geodatabase that already has the schema for the data you're replicating. This saves time, since schema creation can be skipped when creating a replica. This option only applies for checkout replicas but should be used whenever possible.
- Schema only: The Schema only option allows you to create a replica where no rows are replicated. Here only the schema is copied during replica creation. This option only applies for checkout replicas. An example of where this is of use is when you are creating a replica for a field crew that plans on only inputting new information. Using this option saves you the time of setting each dataset to schema only in the wizard.
- Register existing data: If you are replicating a very large amount of data, you may want to consider using the Register existing data option. The option allows you to bypass the data copying step of replica creation and simply register a new replica. To use this option successfully, a specific set of steps must be taken before replica creation. Note that this option is not available when using the geoprocessing tools.
- Replicate related data: During replica creation, filters are applied first, then relationship classes are processed to determine the data to replicate. You can choose to turn off relationship class processing, which will save time. If you choose to turn off relationship class processing, the relationship classes are still included but are not processed during creation and synchronization. An option is available to turn off all relationship class processing in the advanced sections of the Create Replica wizard and geoprocessing tool. The Create Replica wizard also allows you to turn off processing for specific relationship classes.
- Use archiving to track changes: When using archiving to track changes instead of the delta tables associated with versioning, no system versions are created. Therefore, the reconcile and post and compress processes are not affected, making version management and replication management independent. This also allows the synchronization schedule to be more flexible.
- Consider whether to use a connected or disconnected environment: Replicas can be created in both a connected and disconnected environment. In a connected environment, creation and synchronization are done while connected on the same network. In a disconnected environment, the network is not used. Creation and synchronization are done by exporting files, such as XML documents, sending them to, then importing them on, the target. See connected and disconnected replication for more information.
If the network is available but not reliable, you may still want to consider using disconnected replication. A replica creation process running over a slow network may be time consuming and unreliable. With disconnected replication, you can export to a file and continue working without having to wait for the information to be sent over the network. In this case, however, you will want to create backups of these files in case they are lost before being imported on the target.
Synchronizing replicas
Once a replica is created, you can start synchronizing changes between the replica geodatabases. See about synchronization to learn more. To make your system work effectively, it is important to devise a strategy for synchronizing changes. The following should be considered when determining the best strategy for your system.
- Synchronization methods: First determine the best synchronization method for your needs. The following lists some options:
- Manual synchronization: If you are only working with a small number of replicas and plan to only occasionally synchronize changes, consider using the tools provided by ArcGIS. The distributed geodatabase toolbar and the distributed geodatabase context menu in the Catalog tree provide wizards for performing synchronizations. These wizards are available for geodatabase connections as well as geodataserver objects exposed through ArcGIS for Server in the Catalog tree. This allows you to synchronize both local connections and remote connections over the Internet. There are also distributed geodatabase geoprocessing tools that provide the same functionality.
- Automated synchronization using agents: In a system where there are many replicas and/or frequent synchronizations, you should consider building a replication agent. Replication agents work by automatically connecting to replicated geodatabases and performing synchronizations. In this case, end users do not have to explicitly synchronize their databases, as synchronization happens automatically. In a connected environment, the following techniques can be used to build synchronization agents:
- Synchronization using geoprocessing tools: With geoprocessing tools, you can easily build models to synchronize replicas using either local geodatabase connections or connections to geodata server objects running on the Internet. These models can be exported to Python scripts and executed through Python. The commands to execute the scripts can be added to scheduling software, such as the Windows scheduler, so that they can be run on a regular basis. For example, you may want to schedule a synchronization between two enterprise geodatabases once a week at a nonpeak time.
- Synchronization using ArcObjects: Synchronization is fully supported through the ArcObjects API. The API allows you to build more sophisticated synchronization agents than those built using geoprocessing tools. For example, you can add functionality to synchronize a field laptop when the operating system detects that the laptop is connected on the network.
- Synchronization and conflicts: If edits made to a replica's data conflict with edits being synchronized from the relative replica, you will need to determine how to resolve the conflict. A reconcile policy can be applied to automatically resolve the conflicts or enable manual conflict resolution at a later time. Review synchronization and versioning to see if this is a concern for your system. One alternative for working with conflicts is to use the ArcObjects API to build a system to process conflicts. In this system, synchronizations use a manual reconcile policy but have a secondary process that runs automatically afterward to resolve any conflicts that may have arisen.
- The data being synchronized: For checkout replicas, all data changes made in the child replica are synchronized. For two-way and one-way replicas, only changes that meet the requirements of the filters and relationship classes are applied. The replica manager can be used to determine the filters and relationship class rules that have been applied to each replicated dataset. You can also create a replica footprint to store this information locally and visualize each replica's spatial filter.
To maintain data integrity, additional rules are applied when synchronizing complex data types such as geometric networks and topologies. Relationship class processing may also add to the data that gets synchronized. You should review the following topics to become familiar with synchronizing different types of data: Synchronizing topology, Synchronizing related data, and Synchronizing geometric networks.
Metadata for the data you choose to replicate is copied during the replica creation process. However, changes to the metadata are not applied during replica synchronization.
- Data volume: When you synchronize, only changes made since the last synchronization are applied. ArcGIS filters out any changes that have already been sent and acknowledged. Also, once a change has been sent, it is never returned to the original replica. In this way, data volumes are trimmed to just what is needed.
Plan the frequency at which you synchronize to correspond with the rate at which changes are applied to your data. If you do not synchronize frequently enough for the volume of changes, the process may be time consuming. It is also recommended that you synchronize during off-peak hours. In a disconnected environment, you should always use ZIP files instead of uncompressed formats, such as XML files, when exporting data changes. Adopting a practice where you regularly send acknowledgment messages is also recommended.
- The order in which replicas are synchronized: If you are working with several replicas, the order in which they are synchronized may be important. For example, consider the case where you create several two-way replicas from a single ArcSDE geodatabase. One strategy for synchronizing these replicas would be for each child replica to synchronize in both directions with the parent. Here the child sends changes to the parent, then the parent sends changes to the child. Another strategy is for each child replica to first send its changes to the parent. The parent incorporates all the changes, then sends changes back to each child. In the first case, the parent is sending only its changes, while in the second case, it is additionally sending changes incorporated from other replicas. Depending on the requirements of your system, one strategy may be more appropriate than the other.
- Schema changes: Geodatabase replication is designed to allow schema changes. This means that synchronizations will continue to work even if schema changes are made to the replicated data. To a certain degree, you can also apply schema changes across replicas. See working with schema changes for more details.
In general, it is best to keep schema changes to a minimum. If you want to apply schema changes across replicas, it is best to do so in a structured way. For example, to add a field across replicas, first add the field to the feature class in the top-level parent replica. Then set off a process where the schema change is applied to all replicas on downward. See Schema changes and replicas for more information.
- Working through errors: Errors can occur during the synchronization process for a number of reasons. In a connected system, a computer network may fail or you may try to synchronize a replica that is in conflict. In a disconnected system, it is possible to lose messages or you may mistakenly try to import the messages in the incorrect order. In all these cases, the system is designed to remain in a consistent state. Changes are rolled back, and inappropriate messages are rejected. The replica activity log can be used to find any errors that have occurred and determine what to do, if anything, to recover. In most cases, the system will automatically recover from errors if you simply continue synchronizing changes. Replicas also contain generation information, which indicates how many change sets have been sent and how many have been received. See managing replicas for more information.