Cache creation in Amazon EC2

Creating an ArcGIS map, image, or globe service cache in the Amazon Elastic Compute Cloud (EC2) differs from caching outside the cloud in several ways:

This topic discusses the above factors in more detail.

Choosing an instance size and price

Amazon EC2 offers a variety of instance sizes and specifications. Each has its own price per hour of usage. The larger instances, especially those with a lot of memory, can generate tiles very quickly. The smaller instances generate tiles more slowly but have a lower cost.

You can create your cache on an attached Amazon Elastic Block Store (EBS) volume using a powerful instance. When the caching completes, you can detach the EBS volume and attach it to your regular instance (which may be smaller and less expensive). You can then terminate the powerful instance that you used for caching. In this way, you can use the power of the cloud to cache while not committing to a relatively expensive instance for any longer than necessary.

You may need to make a decision between economy and speed. Using a low power instance with a low cost per hour is not always the most economical choice, as the total cost of the cache is dependent on the number of hours spent creating tiles. On the other hand, the most powerful instances may also yield a higher total cost of the cache: even though you spend fewer hours caching, you pay a higher price per hour.

Using a small test cache (perhaps the size of a medium-sized city) as well as a custom Amazon Machine Image (AMI) or site template, you can perform relatively inexpensive tests with different instance types to find out which is most economical for your cache.

Powerful EC2 instance types are well suited to scheduled cache updates, since many update workflows are time sensitive.

Choosing the number of map service instances to use when caching

Each EC2 instance has a certain number of virtual CPU cores. This number is visible when you choose the instance type from the Launch Instance wizard. The number of cores can help you determine how many instances of the CachingTools geoprocessing service to devote toward your caching. Using too many service instances will overwork your CPUs, while too few service instances will leave your CPUs underutilized.

Although the best number may be reached with some trial and error, a good starting point is to allow a maximum of 2n + 1 instances of the CachingTools service, where n is the number of virtual cores on a single EC2 instance in your site. Note that this recommendation is different from the one in the ArcGIS Server Help for on-premises environments, where n+1 instances is suggested as a starting point.

Manual scaling and auto scaling

When building a large cache, you might be tempted to set up auto scaling triggers that automatically increase the number of EC2 instances working on the cache as the CPU usage increases. However, auto scaling is better suited to handling unexpected spikes in traffic. When creating caches, you already know that you will need a great amount of computing power; therefore, it makes more sense to launch all your needed instances before you build the cache, rather than waiting for them to launch sequentially via auto scaling triggers.

Deciding where to place the cache

As described in Strategies for data transfer to Amazon Web Services, there are several types of locations where you can place your data. When you first create the cache, you'll write it to an EBS volume that's attached to your EC2 instance. This volume is attached at the time you build your site, and it's a good place to put the cache if the volume is large enough. If the volume is too small, you need to create and attach another volume and register a server cache directory on it.

Do not build a cache on the C drive of your EC2 instance. If the instance is ever terminated, the cache will be lost.

Ultimately, you might want to move or place a copy of the cache onto Amazon Simple Storage Service (Amazon S3). If you're just interested in keeping a backup on Amazon S3, you can create an EBS snapshot. A snapshot effectively backs up your drive on Amazon S3, and you can quickly use the snapshot to create a new EBS volume if your existing volume fails for any reason.

You can also serve the tiles from Amazon S3 and access them as a custom tile layer using a JavaScript, Flex, or Silverlight application. The advantage of this is that your tiles do not depend on a running service, and you can optionally use Amazon CloudFront to speed the delivery of tiles across the Internet to all parts of the world. If you want to move tiles to Amazon S3 for this purpose, you can transfer the tiles from your EBS volume using either the Amazon Web Services APIs or a third-party front-end application for Amazon S3. You could also do this if you created the cache outside the cloud.

12/29/2014