- What Is SimpleDB?
- Sizing Up the SimpleDB Feature Set
- Abandoning the Relational Model?
- Other Pieces of the Puzzle
- Comparing SimpleDB to Other Products and Services
- Compelling Use Cases for SimpleDB
- Summary
Comparing SimpleDB to Other Products and Services
Numerous new types of products and services are now available or will soon be available in the database/data service space. Some of these are similar to SimpleDB, and others are tangential. A few of them are listed here, along with a brief description and comparison to SimpleDB.
Windows Azure Platform
The Windows Azure Platform is Microsoft's entry into the cloud-computing fray. Azure defines a raft of service offerings that includes virtual computing, cloud storage, and reliable message queuing. Most of these services are counterparts to Amazon services. At the time of this writing, the Azure services are available as a Community Technology Preview. To date, Microsoft has been struggling to gain its footing in the cloud services arena.
There have been numerous, somewhat confusing, changes in product direction and product naming. Although Microsoft's cloud platform has been lagging behind AWS a bit, it seems that customer feedback is driving the recent Azure changes. There is every reason to suspect that once Azure becomes generally available, it will be a solid alternative to AWS.
Among the services falling under the Azure umbrella, there is one (currently) named Windows Azure Table. Azure Table is a distributed key-value store with explicit support for partitioning across storage nodes. It is designed for scalability and is in many ways similar to SimpleDB. The following is a list of similarities between Azure Table and SimpleDB:
- All access to the service is in the form of web requests. As a result, any programming language can be used.
- Requests are authenticated with encrypted signatures.
- Consistency is loosened to some degree.
- Unique primary keys are required for each data entity.
- Data within each entity is stored as a set of properties, each of which is a name-value pair.
- There is a limit of 256 properties per entity.
- A flexible schema allows different entities to have different properties.
- There is a limit on how much data can be stored in each entity.
- The number of entities you can get back from a query is limited and a query continuation token must be used to get the next page of results.
- Service versioning is in place so older versions of the service API can still be used after new versions are rolled out.
- Scalability is achieved through the horizontal partitioning of data.
There are also differences between the services, as listed here:
- Azure Table uses a composite key comprised of a partition key followed by a row key, whereas SimpleDB uses a single item name.
- Azure Table keeps all data with the same partition key on a single storage node. Entities with different partition keys may be automatically spread across hundreds of storage nodes to achieve scalability. With SimpleDB, items must be explicitly placed into multiple domains to get horizontal scaling.
- The only index in Azure Table is based on the composite key. Any properties you want to query or sort must be included as part of the partition key or row key. In contrast, SimpleDB creates an index for each attribute name, and a SQL-like query language allows query and sort on any attribute.
- To resolve conflicts resulting from concurrent updates with Azure Table, you have a choice of either last-write-wins or resolving on the client. With SimpleDB, last-write-wins is the only option.
- Transactions are supported in Azure Table at the entity level as well as for entity groups with the same partition key. SimpleDB applies updates atomically only within the scope of a single item.
Windows Azure Table overall is very SimpleDB-like, with some significant differences in the scalability approach. Neither service has reached maturity yet, so we may still see enhancements aimed at easing the transition from relational databases.
It is worth noting that Microsoft also has another database service in the Windows Azure fold. Microsoft SQL Azure is a cloud database service with full replication across physical servers, transparent automated backups, and support for the full relational data model. This technology is based on SQL Server, and it includes support for T-SQL, stored procedures, views, and indexes. This service is intended to enable direct porting of existing SQL-based applications to the Microsoft cloud.
Google App Engine
App Engine is a service offered by Google that lets you run web applications, written in Java or Python, on Google's infrastructure. As an application-hosting platform, App Engine includes many non-database functions, but the App Engine data store has similarities to SimpleDB. The non-database functions include a number of different services, all of which are available via API calls. The APIs include service calls to Memcached, email, XMPP, and URL fetching.
App Engine includes an API for data storage based on Google Big Table and in some ways is comparable to SimpleDB. Although Big Table is not directly accessible to App Engine applications, there is support in the data store API for a number of features not available in SimpleDB. These features include data relations, object mapping, transactions, and a user-defined index for each query.
App Engine also has a number of restrictions, some of which are similar to SimpleDB restrictions, like query run time. By default, the App Engine data store is strongly consistent. Once a transaction commits, all subsequent reads will reflect the changes in that transaction. It also means that if the primary storage node you are using goes down, App Engine will fail any update attempts you make until a suitable replacement takes over. To alleviate this issue, App Engine has recently added support for the same type of eventual consistency that SimpleDB has had all along. This move in the direction of SimpleDB gives App Engine apps the same ability as SimpleDB apps to run with strong consistency with option to fall back on eventual consistency to continue with a degraded level of service.
Apache CouchDB
Apache CouchDB is a document database where a self-contained document with metadata is the basic unit of data. CouchDB documents, like SimpleDB items, consist of a group of named fields. Each document has a unique ID in the same way that each SimpleDB item has a unique item name. CouchDB does not use a schema to define or validate documents. Different types of documents can be stored in the same database. For querying, CouchDB uses a system of JavaScript views and map-reduce. The loosely structured data in CouchDB documents is similar to SimpleDB data but does not place limits on the amount of data you can store in each document or on the size of the data fields.
CouchDB is an open-source product that you install and manage yourself. It allows distributed replication among peer servers and has full support for robust clustering. CouchDB was designed from the start to handle high levels of concurrency and to maintain high levels of availability. It seeks to solve many of the same problems as SimpleDB, but from the standpoint of an open-source product offering rather than a pay-as-you-go service.
Dynamo-Like Products
Amazon Dynamo is a data store used internally within Amazon that is not available to the public. Amazon has published information about Dynamo that includes design goals, run-time characteristics, and examples of how it is used. From the published information, we know that SimpleDB has some things in common with Dynamo, most notably the eventual consistency.
Since the publication of Dynamo information, a number of distributed key-value stores have been developed that are in the same vein as Dynamo. Three open-source products that fit into this category are Project Voldemort, Dynomite, and Cassandra. Each of these projects takes a different approach to the technology, but when you compare them to SimpleDB, they generally fall into the same category. They give you a chance to have highly available key-value access distributed across machines. You get more control over the servers and the implementation that comes with the maintenance cost of managing the setup and the machines. If you are looking for something in this class of data storage, SimpleDB is a likely touch-free hosted option, and these projects are hands-on self-hosted alternatives.