Topology basics

This topic applies to ArcGIS for Desktop Standard and ArcGIS for Desktop Advanced only.

Topology is a collection of rules that, coupled with a set of editing tools and techniques, enables the geodatabase to more accurately model geometric relationships. ArcGIS implements topology through a set of rules that define how features may share a geographic space and a set of editing tools that work with features that share geometry in an integrated fashion. A topology is stored in a geodatabase as one or more relationships that define how the features in one or more feature classes share geometry. The features participating in a topology are still simple feature classes—rather than modifying the definition of the feature class, a topology serves as a description of how the features can be spatially related.

Why topology?

Topology has long been a key GIS requirement for data management and integrity. In general, a topological data model manages spatial relationships by representing spatial objects (point, line, and area features) as an underlying graph of topological primitives—nodes, faces, and edges. These primitives, together with their relationships to one another and to the features whose boundaries they represent, are defined by representing the feature geometries in a planar graph of topological elements.

Example topological line graph of nodes, faces, and edges

Topology is fundamentally used to ensure data quality of the spatial relationships and to aid in data compilation. Topology is also used for analyzing spatial relationships in many situations, such as dissolving the boundaries between adjacent polygons with the same attribute values or traversing a network of the elements in a topology graph.

Topology can also be used to model how the geometry from a number of feature classes can be integrated. Some refer to this as vertical integration of feature classes.

Ways that features share geometry in a topology

Features can share geometry within a topology. Here are some examples among adjacent features:

In addition, shared geometry can be managed between feature classes using a geodatabase topology. For example:

NoteNote:

Parcels have commonly been managed using simple feature classes and geodatabase topology, so that the set of feature classes needed to model parcels, boundaries, corner points, and control points obey the required coincidence rules. Another way to manage parcels is with a parcel fabric, which automatically provides these layers for you. A fabric manages its internal topology, with no requirement to maintain a geodatabase topology or perform any topological editing for the set of layers used by parcels.

A key difference between parcels modeled as simple features and parcels in a fabric is that fabric parcel boundaries (lines in a fabric) are not shared—there is a complete set of lines on the boundary of each parcel; fabric lines for adjacent parcels overlap and are coincident.

Parcel fabrics may still participate in geodatabase topology; where overlapping boundary lines have differing geometry, the lines are cracked, and the topology graph is built as usual.

Two views: Features and topological elements

A layer of polygons can be described and used:

This means that there are two alternatives for working with features—one in which features are defined by their coordinates and another in which features are represented as an ordered graph of their topological elements.

The evolution of geodatabase topology from coverages

NoteNote:

Reading this large topic is not necessary to implement geodatabase topologies. However, you may want to spend some time reading this if you are interested in the historical evolution and motivations for how topology is managed in the geodatabase.

The genesis of Arc-node and Georelational

ArcInfo Workstation coverage users have a long history and appreciation for the role that topology plays in maintaining the spatial integrity of their data.

Here are the elements of the coverage data model.

The feature classes in a coverage

In a coverage, the feature boundaries and points were stored in a few main files that were managed and owned by ArcInfo Workstation. The ARC file held the linear or polygon boundary geometry as topological edges, which were referred to as arcs. The LAB file held point locations, which were used as label points for polygons or as individual point features such as for a wells feature layer. Other files were used to define and maintain the topological relationships between each of the edges and the polygons.

For example, one file called the PAL file (which stands for Polygon-arc list) listed the order and direction of the arcs in each polygon. In ArcInfo Workstation, software logic was used to assemble the coordinates for each polygon for display, analysis, and query operations. The ordered list of edges in the PAL file was used to look up and assemble the edge coordinates held in the ARC file. The polygons were assembled during runtime when needed.

The coverage model had several advantages:

  • It used a simple structure to maintain topology.
  • It enabled edges to be digitized and stored only once and shared by many features.
  • It could represent polygons of enormous size (with thousands of coordinates) because polygons were really defined as an ordered set of edges (arcs)
  • The Topology storage structure of the coverage was intuitive. Its physical topological files were readily understood by ArcInfo Workstation users.
LegacyLegacy:

An interesting historical fact: Arc, when coupled with the table manager Info, was the genesis of the product name ArcInfo Workstation, which led to all subsequent Arc products in the Esri product family—ArcInfo, ArcIMS, ArcGIS, and so on.

Coverages also had some disadvantages:

  • Some operations were slow because many features had to be assembled on the fly when they needed to be used. This included all polygons and multipart features such as regions (the coverage term for multipart polygons) and routes (the term for multipart line features).
  • Topological features (such as polygons, regions, and routes) were not ready to use until the coverage topology was built. If edges were edited, the topology had to be rebuilt. (Note: Partial processing was eventually used, which required rebuilding only the changed portions of the coverage topology.) In general, when edits are made to features in a topological dataset, a geometric analysis algorithm must be executed to rebuild the topological relationships regardless of the storage model.
  • Coverages were limited to single-user editing. Because of the need to ensure that the topological graph was in synchronization with the feature geometries, only a single user could update a topology at a time. Users would tile their coverages and maintain a tiled database for editing. This enabled individual users to lock down and edit one tile at a time. For general data use and deployment, users would append copies of their tiles into a mosaicked data layer. In other words, the tiled datasets they edited were not directly used across the organization. They had to be converted, which meant extra work and extra time.

Shapefiles and simple geometry storage

In the early 1980s, coverages were seen as a major improvement over the older polygon and line-based systems in which polygons were held as complete loops. In these older systems, all the coordinates for a feature were stored in each feature's geometry. Before the coverage and ArcInfo Workstation came along, these simple polygon and line structures were used. These data structures were simple but had the disadvantage of double digitized boundaries. That is, two copies of the coordinates of the adjacent portions of polygons with shared edges would be contained in each polygon's geometry. The main disadvantage was that GIS software at the time could not maintain shared edge integrity. Plus, storage costs were enormous, and each byte of storage came at a premium. During the early 1980s, a 300 MB disk drive was the size of a washing machine and cost $30,000. Holding two or more representations of coordinates was expensive, and the computations took too much compute time. Thus, the use of a coverage topology had real advantages.

During the mid 1990s, interest in simple geometric structures grew because disk storage and hardware costs in general were coming down while computational speed was growing. At the same time, existing GIS datasets were more readily available, and the work of GIS users was evolving from primarily data compilation activities to include data use, analysis, and sharing.

Users wanted faster performance for data use (for example, don't spend computer time to derive polygon geometries when we need them. Just deliver the feature coordinates of these 1,200 polygons as fast as possible). Having the full feature geometry readily available was more efficient. Thousands of geographic information systems were in use, and numerous datasets were readily available.

Around this time, Esri developed and published its shapefile format. Shapefiles used a very simple storage model for feature coordinates. Each shapefile represented a single feature class (of points, lines, or polygons) and used a simple storage model for the feature's coordinates. Shapefiles could be easily created from coverages as well as many other geographic information systems. They were widely adopted as a de facto standard and are still massively used and deployed to this day.

A few years later, ArcSDE pioneered a similar simple storage model in relational database tables. A feature table could hold one feature per row with the geometry in one of its columns along with other feature attribute columns.

A sample feature table of state polygons is shown below. Each row represents a state. The shape column holds the polygon geometry of each state.

Feature class table showing the shape column

This simple features model fits the SQL processing engine very well. Through the use of relational databases, we began to see GIS data scale to unprecedented sizes and numbers of users without degrading performance. We were beginning to leverage RDBMS for GIS data management.

Shapefiles became ubiquitous, and using ArcSDE, this simple features mechanism became the fundamental feature storage model in RDBMSs. (To support interoperability, Esri was the lead author of the OGC and ISO simple features specification).

Simple feature storage had clear advantages:

  • The complete geometry for each feature is held in one record. No assembly is required.
  • The data structure (physical schema) is very simple, fast, and scalable.
  • It is easy for programmers to write interfaces.
  • It is interoperable. Many wrote simple converters to move data in and out of these simple geometries from numerous other formats. Shapefiles were widely applied as a data use and interchange format.

Its disadvantages were that maintaining the data integrity that was readily provided by topology was not as easy to implement for simple features. As a consequence, users applied one data model for editing and maintenance (such as coverages) and used another for deployment (such as shapefiles or ArcSDE layers).

Users began to use this hybrid approach for editing and data deployment. For example, users would edit their data in coverages, CAD files, or other formats. Then, they would convert their data into shapefiles for deployment and use. Thus, even though the simple features structure was an excellent direct use format, it did not support the topological editing and data management of shared geometry. Direct use databases would use the simple structures, but another topological form was used for editing. This had advantages for deployment. But the disadvantage was that data would become out-of-date and have to be refreshed. It worked, but there was a lag time for information update. Bottom line—topology was missing.

What GIS required and what the geodatabase topology model implements now is a mechanism that stores features using the simple feature geometry but enables topologies to be used on this simple, open data structure. This means that users can have the best of both worlds—a transactional data model that enables topological query, shared geometry editing, rich data modeling, and data integrity, but also a simple, highly scalable data storage mechanism that is based on open, simple feature geometry.

This direct use data model is fast, simple, and efficient. It can also be directly edited and maintained by any number of simultaneous users.

The topology framework in ArcGIS

In effect, topology has been considered as more than a data storage problem. The complete solution includes the following:

  • A complete data model (objects, integrity rules, editing and validation tools, a topology and geometry engine that can process datasets of any size and complexity, and a rich set of topological operators, map display, and query tools)
  • An open storage format using a set of record types for simple features and a topological interface to query simple features, retrieve topological elements, and navigate their spatial relationships (that is, find adjacent areas and their shared edge, route along connected lines)
  • The ability to provide the features (points, lines, and polygons) as well as the topological elements (nodes, edges, and faces) and their relationships to one another
  • A mechanism that can support the following
    • Massively large datasets with millions of features
    • Ability to perform editing and maintenance by many simultaneous editors
    • Ready-to-use, always available feature geometry
    • Support for topological integrity and behavior
    • A system that goes fast and scales for many users and many editors
    • A system that is flexible and simple
    • A system that leverages the RDBMS SQL engine and transaction framework
    • A system that can support multiple editors, long transactions, historical archiving, and replication

In a geodatabase topology, the validation process identifies shared coordinates between features (both in the same feature class and across feature classes). A clustering algorithm is used to ensure that the shared coordinates have the same location. These shared coordinates are stored as part of each feature's simple geometry.

This enables very fast and scalable lookup of topological elements (nodes, edges, and faces). This has the added advantage of working quite well and scaling with the RDBMS's SQL engine and transaction management framework.

During editing and update, as features are added, they are directly usable. The updated areas on the map, dirty areas, are flagged and tracked as updates are made to each feature class. At any time, users can choose to topologically analyze and validate the dirty areas to generate clean topology. Only the topology for the dirty areas needs rebuilding, saving processing time.

The results are that topological primitives (nodes, edges, and faces) and their relationships to one another and their features can be efficiently discovered and assembled. This has several advantages:

  • Simple feature geometry storage is used for features. This storage model is open, efficient, and scales to large sizes and numbers of users.
  • This simple features data model is transactional and is multiuser. By contrast, the older topological storage models will not scale and have difficulties supporting multiple editor transactions and numerous other GIS data management workflows.
  • Geodatabase topologies fully support all the long transaction and versioning capabilities of the geodatabase. Geodatabase topologies need not be tiled, and many users can simultaneously edit the topological database—even their individual versions of the same features if necessary.
  • Feature classes can grow to any size (hundreds of millions of features) with very strong performance.
  • This topology implementation is additive. You can typically add this to an existing schema of spatially related feature classes. The alternative is that you must redefine and convert all your existing feature classes to new data schemas holding topological primitives.
  • There need only be one data model for geometry editing and data use, not two or more.
  • It is interoperable because all feature geometry storage adheres to simple features specifications from the Open Geospatial Consortium and ISO.
  • Data modeling is more natural because it is based on user features (such as parcels, streets, soil types, and watersheds) instead of topological primitives (such as nodes, edges, and faces). Users will begin to think about the integrity rules and behavior of their actual features instead of the integrity rules of the topological primitives. For example, how do parcels behave? This will enable stronger modeling for all kinds of geographic features. It will improve our thinking about streets, soils types, census units, watersheds, rail systems, geology, forest stands, land forms, physical features, and on and on.
  • Geodatabase topologies provide the same information content as maintained topological implementations—either you store a topological line graph and discover the feature geometry (like coverages) or you store the feature geometry and discover the topological elements and relationships (like geodatabases).

In cases where users want to store the topological primitives, it is easy to create and post topologies and their relationships to tables for various analytic and interoperability purposes (such as users who want to post their features into an Oracle Spatial warehouse that stores tables of topological primitives).

At a pragmatic level, the ArcGIS topology implementation works. It scales to extremely large geodatabases and multiuser systems without loss of performance. It includes validation and editing tools for building and maintaining topologies in geodatabases. It includes rich and flexible data modeling tools that enable users to assemble practical, working systems on file systems, in any relational database, and on any number of schemas.

6/20/2012