DynamoDB – Scalable NoSQL DB on the cloud

DynamoDB – Scalable NoSQL DB on the cloud

by Jason Simon
What is DynamoDB?

DynamoDB is highly scalable enterprise-level cloud-based NoSQL database service provided by Amazon. It uses a document-based structure, but includes some of the features of RDBMS-like databases such as tables and keys. It boasts a high-level of scalability, and boasts single-digit millisecond responsiveness which makes it available to be utilized by some of the largest internet companies; it has been adopted by the likes of Netflix, Lyft, IMDb, Duolingo, Shazam, and many more.

Unlike other NoSQL databases like MongoDB, getting started is relatively straightforward; there’s little learning curve (at least at first). It uses a simple dashboard which, if you are already familiar with other AWS products, you should have little trouble using. It uses a standard JSON format, which makes it very easy to integrate with any of your existing web-based applications.

NoSQL

Before we get into a discussion of DynamoDB it’s best if we clarify NoSQL. Traditional relational SQL databases rely heavily on structured data. There are predefined relationships between tables, with clearly defined primary keys which can act as links between one table and another. These work extremely well if you wish to maintain full control over your data. Well defined structures and enforced foreign key relationships ensure that data integrity is maintained.

However, in the age of Big Data; as we have less control over our information, or perhaps we only have partial information about an entity, the weaknesses of relational database become clearly apparent. Add to this the fact that SQL databases require a lot of processing power. As result, querying huge amounts of data can be resource-expensive, resulting in increased lag-time.

In contrast, NoSQL databases, instead of relying on tabular data, rely on a different set data models.

It’s easiest to explain this by using the document model. With document-driven NoSQL databases, data can be created and served on-the-fly. They allow data to be delivered with as few as one or two keys, but can also allow nearly unlimited extra information. Instead of relying on numerical primary keys, fields like “name” can serve as the key. Relationships between other data records are not enforced, and as a result NoSQL databases are much more lightweight. With databases such as MongoDB, data are stored in documents in the widely used JSON format, making data easily accessible for querying. NoSQL provides a number lightweight formats which can handle the vast amount of data necessary for current data storage and access.

What is the difference between DynamoDB and other NoSQL databases?

Before we go any further, it’s important to understand that there are several types of NoSQL databases. Not all of them use the document type as above. NoSQL Databases are really only defined by what they are not, which is specifically SQL (though under some definitions it can mean “not only SQL”). Below is a summary of the different types.

  • Documents, which store data as JSON-like tagged elements (examples: MongoDB, DynamoDB, Couchbase)
  • Key-value, which includes a simple set of key-value pairs (examples: Redis, DynamoDB)
  • Wide-column, which are typically used to be able to query from extremely large datasets by using only queries off single columns (examples: Cassandra, BigTable, Hbase, DynamoDB)
  • Graph-based, which store information in graph-type formats, such as networks (examples: AllegroGraph, Apache Giraph)

The most popular NoSQL Database is MongoDB which uses a document-driven data structure. Like MongoDB, DynamoDb uses a document based format but also uses elements of wide-column and key-value, and stores records in tables.

To better understand how this works, let’s look at a couple of simple entries in a document-driven database, like MongoDB

{
	name: "Jane",
	age: "38",
	friends: ["John", "Susan", "Sarah"]
},
{
	name: "Horace",
	age: "39",
	height: "5 feet"
}

You’ll notice that it uses a simple JSON-type format. You will also notice that each row has a different set of keys. Unlike in SQL, not every entity has to include the same keys.

DynamoDB uses a similar structure, but also borrows some concepts from the RDBMS model in that it defines tables, items and attributes, and also uses primary keys to identify unique items. Like MongoDB, DynamoDB allows the on-the-fly, loosely defined structure, however items are instead stored in a tabular format. It allows the use secondary keys for more flexibility. The secondary index is not necessary; it only becomes used when querying the database.

In DynamoDB the above might exist in a table named “customer,” with a key “person.” For example:

TableName: "customer",	
		person: 	
		{
			name: "Jane",
			age: "38",
			friends: ["John", "Susan", "Sarah"]
		},
		person:
		{
			name: "Horace",
			age: "39",
			height: "5 feet"
		}

Advantages

DynamoDB has a number of features that provide some advantages over some other NoSQL databases.

DynamoDB Streams

One of the main features that is unique to DynamoDB is its streams. DynamoDB streams enable the capturing of small ordered sequential changes which can be be shared rapidly with multiple users. Changes by any user within this stream can be relayed back across the network to all users and maintain the order of activity. It handles this by creating replica tables for each piece of activity that stay in sync with the original table, making it relatively easy to create applications that can make use of this information and take action based on information being passed. Below is a diagram of how endpoints in DynamoDB streams work.


Source: Endpoints for DynamoDB Streams

The uses of this are many, ranging from real-­time inventory management to complicated game play.

Auto Scaling

While all NoSQL databases are scalable almost by definition, DynamoDB has a few specific options that can make it particularly appealing. Being part of of the AWS universe, it shares some features in-common with some of Amazon’s other database products.

Planning the amount of capacity you need to be able to handle your database at peak loads has always been an issue for database management. With DynamoDB (as with their RDBMS product) you have the ability to configure your database to automatically scale up if usage suddenly increases. This may follow a standard pattern for certain times of the day, which may be predictable, but it could also be due to a sudden increase in activity for any reason on your database.

Of course you could simply increase your provisioned capacity to handle high loads, however as with other Amazon products, the amount you pay is tied to usage; you may not wish to pay for unused capacity. With auto scaling you can both have your provisioned capacity expand to maintain speed without throttling, but it can also reduce the amount used in periods of low usage. By setting your configurations to use target tracking, you can mange the throughput through use of secondary indexes to handle growth without any slowdown.

The diagram below can help demonstrate how the process works:


Source: How DynamoDB Auto Scaling Works

Serverless Web Applications

One of the most useful features of DynamoDB is that it is designed to integrate well with serverless Backend as a Service (BaaS) applications, meaning that it can be run entirely in the cloud without any requirements for on-premise hardware.  You can use AWS Lambda functions to configure it.

Possible Drawbacks

We’ve addressed many of the positive features regarding the ease of entry, the power and speed of this application and its flexibility for working with web applications, however there are a few possible drawbacks associated with using DynamoDB.

While, like other Amazon products you pay only as you go, the expense of running this can grow extremely fast, especially if you use a lot of tables and have a lot of users querying your data. While this is something that can be partially handled with correctly configured auto-scaling settings, it’s not hard to make a mistake and end up increasing your bottom line considerably.

While, as mentioned earlier, it’s relatively easy to get started, the documentation is very dense. It is very easy to make some mistakes early on, so it’s a good idea to read and absorb the manuals before you start doing anything with any real scale, as this can have in impact on the bottom line as well.

One of the most important factors is to determine whether it’s right for your specific needs. For example, if you have a relatively small dataset and you need to be able to manage it very carefully, you are probably better off sticking with an RDBMS. Other important factors depend on whether you intend to do a lot of writing to your database; it’s a lot less expensive to do reads than writes, so that’s a factor to consider.

There are, of course, ways around this, but you do need to determine this before you end up slowing down your entire process. You are, however, prompted during the setup process and can determine the ratio of reads to writes, and it will tell you the estimated cost of these change. However, as stated before, read the manuals before you begin.

Getting Started:

If you would like to try it out, there are few barriers to entry; all you need is an Amazon account, and follow the straightforward prompts provided in the AWS NoSQL services. Simply start off, create a table and start entering data at the prompts.

You can get started with DynamoDB here.

Jason Simon is a professional back-end web developer, database consultant, and systems librarian, with over 20 years of experience helping businesses, non-profit organizations, and educational institutions organize and convert complex data into clear and easy-to-understand information.

Leave a Reply

Your email address will not be published. Required fields are marked *