Sizing Up the SimpleDB Feature Set
The SimpleDB API exposes a limited set of features. Here is a list of what you get:
- You can create named domains within your account. At the time of this writing, the initial allocation allows you to create up to 100 domains. You can request a larger allocation on the AWS website.
- You can delete an existing domain at any time without first deleting the data stored in it.
- You can store a data item for the first time or for subsequent updates using a call to PutAttributes. When you issue an update, you do not need to pass the full item; you can pass just the attributes that have changed.
- There is a batch call that allows you to put up to 25 items at once.
- You can retrieve the data with a call to GetAttributes.
- You can query for items based on criteria on multiple attributes of an item.
- You can store any type of data. SimpleDB treats it all as string data, and you are free to format it as you choose.
- You can store different types of items in the same domain, and items of the same type can vary in which attributes have values.
Benefits of Using SimpleDB
When you use SimpleDB, you give up some features you might otherwise have, but as a trade-off, you gain some important benefits, as follows:
- Availability—When you store your data in SimpleDB, it is automatically replicated across multiple storage nodes and across multiple data centers in the same region.
- Simplicity—There are not a lot of knobs or dials, and there are not any configuration parameters. This makes it a lot harder to shoot yourself in the foot.
- Scalability—The service is designed for scalability and concurrent access.
- Flexibility—Store the data you need to store now, and if the requirements change, store it differently without changing the database.
- Low latency within the same region—Access to SimpleDB from an EC2 instance in the same region has the latency of a typical LAN.
- Low maintenance—Most of the administrative burden is transferred to Amazon. They maintain the hardware and the database software.
Database Features SimpleDB Doesn't Have
There are a number of common database features noticeably absent from Amazon SimpleDB. Programs based on relational database products typically rely on these features. You should be aware of what you will not find in SimpleDB, as follows:
- Full SQL support—A query language similar to SQL is supported for queries only. However, it only applies to "select" statements, and there are some syntax differences and other limitations.
- Joins—You can issue queries, but there are no foreign keys and no joins.
- Auto-incrementing primary keys—You have to create your own primary keys in the form of an item name.
- Transactions—There are no explicit transaction boundaries that you can mark or isolation levels that you can define. There is no notion of a commit or a rollback. There is some implicit support for atomic writes, but it only applies within the scope of each individual item being written.
Higher-Level Framework Functionality
This simplicity of what SimpleDB offers on the server side is matched by the simplicity of what AWS provides in officially supported SimpleDB clients. There is a one-to-one mapping of service features to client calls. There is a lot of functionality that can be built atop the basic SimpleDB primitives. In addition, the inclusion of these advance features has already begun with a number of third-party SimpleDB clients. Popular persistence frameworks used as an abstraction layer above relational databases are prime candidates for this.
Some features normally included within the database server can be written into SimpleDB clients for automatic handling. Third-party client software is constantly improving, and some of the following features may be present already or you may have to write it for yourself:
- Data formatting—Integers, floats, and dates require special formatting in some cases.
- Object mapping—It can be convenient to map programming language objects to SimpleDB attributes.
- Sharding—The domain is the basic unit of horizontal scalability in SimpleDB. However, there is no explicit support for automatically distributing data across domains.
- Cache integration—Caching is an important aspect of many applications, and caching popular data objects is a well-understood optimization. Configurable caching that is well integrated with a SimpleDB client is an important feature.
Service Limits
There are quite a few limitations on what you are allowed to do with SimpleDB. Most of these are size and quantity restrictions. There is an underlying philosophy that small and quickly serviced units of work provide the greatest opportunity for load balancing and maintaining uniform service levels. AWS maintains a current listing of the service limitations within the latest online SimpleDB Developer Guide at the AWS website. At the time of this writing, the limits are as follows:
- Max storage per domain: 10GB
- Max attribute values per domain: 1 billion
- Initial max domains per account: 100
- Max attribute values per item: 256
- Max length of item name, attribute name, or value: 1024 bytes
- Max query execution time: 5 seconds
- Max query results: 2500
- Max query response size: 1MB
- Max comparisons per query: 20
These limits may seem restrictive when compared to the unlimited nature of data sizes you can store in other database offerings. However, there are two things to keep in mind about these limits. First, SimpleDB is not a general-purpose data store suitable for everything. It is specifically designed for storing small chunks of data. For larger data objects that you want to store in the cloud, you are advised to use Amazon S3. Secondly, consider the steps that need to be taken with a relational database at higher loads when performance begins to degrade. Typical recommendations often include offloading processing from the database, reducing long-running queries, and applying selective de-normalization of the data. These limits are what help enable efficient and automatic background replication and high concurrency and availability. Some of these limits can be worked around to a degree, but no workarounds exist for you to make SimpleDB universally appropriate for all data storage needs.