Configuration keywords for file geodatabases

When you create a dataset in a file geodatabase, you can choose a configuration keyword to customize how the data is stored. Each keyword optimizes storage for a particular type of data, slightly improving storage efficiency and performance. There are seven keywords available. These keywords cannot be customized.

In most cases, you will specify the DEFAULTS keyword when you create a feature class or raster in a file geodatabase. DEFAULTS works well in most cases. The only exceptions include the following:

If you do not specify any configuration keyword, DEFAULTS is used.

This keyword

How it affects data storage

DEFAULTS

Stores data up to 1 TB in size

Text is stored in UTF8 format.

TEXT_UTF16

Stores data up to 1 TB in size

Text is stored in UTF16 format.

MAX_FILE_SIZE_4GB

Limits data size to 4 GB

Text is stored in UTF8 format.

MAX_FILE_SIZE_256TB

Stores data up to 256 TB in size

Text is stored in UTF8 format.

GEOMETRY_OUTOFLINE

Stores data up to 1 TB in size

Text is stored in UTF8 format.

Stores the geometry attribute in a file separate from the nonspatial attributes

BLOB_OUTOFLINE

Stores data up to 1 TB in size

Text is stored in UTF8 format.

Stores BLOB attributes in a file separate from the rest of the attributes

GEOMETRY_AND_BLOB_OUTOFLINE

Stores data up to 1 TB in size

Text is stored in UTF8 format.

Stores both geometry and BLOB attributes in files separate from the rest of the attributes

Configuration keywords available for file geodatabase datasets

Text storage: UTF8 vs. UTF16

UTF8 is the most efficient storage format if your text data is in English; another Western European language; or any other language that uses the Latin alphabet, such as Polish, Turkish, or Indonesian. UTF8 stores each nonaccented Latin character in just 1 byte and each accented character or any other character not found in the Latin alphabet in a variable number of bytes, ranging from 2 to 6. Since UTF8 stores the vast majority of text characters in just 1 byte, it results in lower storage requirements and improved performance for these languages.

UTF16 is the most efficient storage format for text data in a non-Latin alphabet, such as Chinese, Japanese, Korean, Russian, Greek, or Arabic. For these languages, this format uses just 2 bytes per character. The UTF8 representation of the same character might use up to 6 bytes, which would increase storage requirements and slightly slow performance for these languages. This method of storing text is only available with the TEXT_UTF16 keyword that comes with a 1 TB size limit.

MAX_FILE_SIZE_4GB

This keyword stores datasets that are less than 4GB in size slightly more efficiently than the DEFAULTS keyword, although the size savings is relatively insignificant at 1 byte per record, or about 1 MB per million records. As an example, all the roads in California (2,092,079 records) store as 312 MB with the DEFAULTS keyword and 310 MB with the MAX_FILE_SIZE_4GB keyword.

This keyword restricts a dataset to a maximum size of 4 GB, so specify it only if you know a feature class or raster dataset will never need to grow beyond this size.

MAX_FILE_SIZE_256TB

Specifying the MAX_FILE_SIZE_256TB configuration keyword allows you to create a dataset that is up to 256 TB in size. You would normally only specify this keyword to store a large raster dataset.

NoteNote:

Although the file geodatabase allows you to store datasets of this size, you must have enough disk space to do so.

In-line vs. out-of-line storage

Storing data inline means all the attributes are in the same file or virtual table in the file geodatabase. When you store data out of line, it is stored in a different object.

If all the data is stored inline, it is loaded into memory when you query or edit the feature class. Therefore, if the feature class contains attributes that use a lot of storage space, it can take a long time to load into memory and requires a larger buffer to store in memory.

Geometry and BLOB attribute types have the potential to store large amounts of data. For example, if many of the features in the feature class contain thousands of vertices, you might want to store the geometry out of line. Or, if your attribute data is large (made up of several text columns or large BLOB columns), you might want to store your geometry out of line so when you access the geometry, you don't automatically have to pull all the attribute information into memory. If you store either or both geometry or BLOB types out of line, they are only loaded into memory when the application requests them. For example, if you select features in ArcMap based on the BLOB value, the BLOB attributes will be loaded into memory.

If your feature class will contain large BLOB attributes, you can specify the BLOB_OUTOFLINE keyword when you create the feature class. Then, the BLOB attribute will only be loaded if you query it.

NoteNote:

The GEOMETRY_AND_BLOB_OUTOFLINE keyword is always used when terrain datasets are created to improve performance. This is internal to the software and cannot be altered.

7/30/2013