Representing the real world as data

How would you create an information system to organize and manage the huge variety of geographic stuff in the world? One approach is to think of all that stuff in terms of discrete objects.

The discrete-object view of the world

If you conceive geography in terms of objects, you can sort these objects on the basis of similarities. Shape is a fundamental sorting principle: every object can be drawn—in two dimensions—as either a point, a line, or a polygon. Theme, or type, is another principle: every object can be classified as a school, a road, a park, or a something-or-other.

Applying these sorting principles of shape and theme, we can come up with collections of things we would recognize on a map: schools represented as points, roads represented as lines, parks represented as polygons, and so on.

Points, lines, and polygons representing geographic objects on a map

Each object in a collection has a unique location, specified by a pair of spatial coordinates (for points) or a list of coordinate pairs (for lines and polygons).

A polygon shown next to the list of coordinate pairs that defines its location

Besides a unique location, every object in a collection has a set of facts that pertain to it: a name, address, description, or whatever bits of information have been gathered about it. These facts are the object's attributes.

A table of park attributes, including park name, category, and address

In ArcGIS, such a collection of objects—with a common shape, common theme, common attributes, and spatial coordinates—is called a feature class. An individual object in the collection is a feature. The feature class is the basic storage unit for GIS data created according to the discrete-object model of the world, or the vector data model, as it's usually called.

Feature classes can be stored in a couple of different file formats: geodatabase or shapefile. The geodatabase format is newer and more sophisticated.

A side-by-side view of geodatabase and shapefile feature classes
A directory view of feature classes in geodatabase format (left) and shapefile format (right). Distinct icons are used for point, line, and polygon feature classes.

The continuous-surface view of the world

Although it's a very powerful model, the discrete-object view is not an intuitive way to think of certain aspects of geography, like elevation or climate, that don’t really have shapes or boundaries and that cover the world everywhere. Although it's possible to represent these geographic states-of-being with shapes (for example, contour lines can be used to represent elevation), a more natural way to think of them is in terms of continuous expanses, or surfaces.

Surfaces can be modeled in different ways, but the most common way is as a matrix of square cells, or pixels. Each cell represents a unit of area, such as a square meter, and stores a measured value for some geographic condition or at that location. In this example, each cell represents thirty square meters of ground and stores an elevation value.

A matrix of cells with a number inside each cell

This way of modeling surfaces is called the raster data model. It's commonly used for elevation and elevation-related data (slope, aspect); for temperature, precipitation, and land cover; for statistically derived data, such as densities and means; and, especially, for imagery.

The raster dataset is the basic storage unit for GIS data created according to the continuous-surface model of the world. Raster datasets can also be stored in geodatabase format or in a variety of standard image file formats, such as TIFF and JPEG.

A side-by-side view of raster datasets in a geodatabase among feature classes and in a folder among shapefiles

Feature classes and rasters are complementary. In many maps, rasters are used for background display, while feature classes are used for foreground display and analysis.