DynamoDB – Scalable NoSQL DB on the cloud

by Vijay K

DynamoDB is a highly scalable enterprise-level cloud-based NoSQL database service provided by Amazon. It uses a document-based structure but includes some of the features of RDBMS-like databases such as tables and keys. It boasts a high level of scalability, and boasts single-digit millisecond responsiveness which makes it available to be utilized by some of the largest internet companies; it has been adopted by the likes of Netflix, Lyft, IMDb, Duolingo, Shazam, and many more.

Unlike other NoSQL databases like MongoDB, getting started is relatively straightforward; there’s a little learning curve (at least at first). It uses a simple dashboard which, if you are already familiar with other AWS products, you should have little trouble using. It uses a standard JSON format, which makes it very easy to integrate with any of your existing web-based applications.

NoSQL

Before we get into a discussion of DynamoDB it’s best if we clarify NoSQL. Traditional relational SQL databases rely heavily on structured data. There are predefined relationships between tables, with clearly defined primary keys which can act as links between one table and another. These work extremely well if you wish to maintain full control over your data. Well-defined structures and enforced foreign key relationships ensure that data integrity is maintained.

However, in the age of Big Data; as we have less control over our information, or perhaps we only have partial information about an entity, the weaknesses of relational databases become clearly apparent. Add to this the fact that SQL databases require a lot of processing power. As result, querying huge amounts of data can be resource-expensive, resulting in increased lag-time.

In contrast, NoSQL databases, instead of relying on tabular data, rely on a different set of data models.

It’s easiest to explain this by using the document model. With document-driven NoSQL databases, data can be created and served on-the-fly. They allow data to be delivered with as few as one or two keys, but can also allow nearly unlimited extra information. Instead of relying on numerical primary keys, fields like “name” can serve as the key. Relationships between other data records are not enforced and as a result, NoSQL databases are much more lightweight. With databases such as MongoDB, data is stored in documents in the widely used JSON format, making data easily accessible for querying. NoSQL provides a number of lightweight formats which can handle the vast amount of data necessary for current data storage and access.

What is the difference between DynamoDB and other NoSQL databases?

Before we go any further, it’s important to understand that there are several types of NoSQL databases. Not all of them use the document type above. NoSQL Databases are really only defined by what they are not, which is specifically SQL (though under some definitions it can mean “not only SQL”). Below is a summary of the different types.

  • Documents – Which store data as JSON-like tagged elements (examples: MongoDB, DynamoDB, Couchbase)
  • Key-value – Which includes a simple set of key-value pairs (examples: Redis, DynamoDB)
  • Wide-column – Which are typically used to be able to query from extremely large datasets by using only queries off single columns (examples: Cassandra, BigTable, Hbase, DynamoDB)
  • Graph-based – Which stores information in graph-type formats, such as networks (examples: AllegroGraph, Apache Giraph)

The most popular NoSQL Database is MongoDB which uses a document-driven data structure. Like MongoDB, DynamoDb uses a document-based format but also uses elements of wide-column and key-value, and stores records in tables.

To better understand how this works, let’s look at a couple of simple entries in a document-driven database, like MongoDB

{

name: "Jane",

age: "38",

friends: ["John", "Susan", "Sarah"]

},

{

name: "Horace",

age: "39",

height: "5 feet"

}

You’ll notice that it uses a simple JSON-type format. You will also notice that each row has a different set of keys. Unlike in SQL, not every entity has to include the same keys.

DynamoDB uses a similar structure but also borrows some concepts from the RDBMS model in that it defines tables, items, and attributes, and also uses primary keys to identify unique items. Like MongoDB, DynamoDB allows the on-the-fly, loosely defined structure; however, items are instead stored in a tabular format. It allows the use of secondary keys for more flexibility. The secondary index is not necessary; it only becomes used when querying the database.

In DynamoDB the above might exist in a table named “customer,” with a key “person.” For example:

TableName: "customer",

person:

{

name: "Jane",

age: "38",

friends: ["John", "Susan", "Sarah"]

},

person:

{

name: "Horace",

age: "39",

height: "5 feet"

}

Advantages

DynamoDB has a number of features that provide some advantages over some other NoSQL databases.

DynamoDB Streams

One of the main features that is unique to DynamoDB is its streams. DynamoDB streams enable the capturing of small ordered sequential changes which can be shared rapidly with multiple users. Changes by any user within this stream can be relayed back across the network to all users and maintain the order of activity. It handles this by creating replica tables for each piece of activity that stays in sync with the original table, making it relatively easy to create applications that can make use of this information and take action based on information being passed. Below is a diagram of how endpoints in DynamoDB streams work.

DynamoDB - Application
Source: Endpoints for DynamoDB Streams

There are numerous uses for this ranging from real-­time inventory management to complicated gameplay.

Auto Scaling

While all NoSQL databases are scalable almost by definition, DynamoDB has a few specific options that can make it particularly appealing. Being part of the AWS universe, it shares some features in common with some of Amazon’s other database products.

Planning the amount of capacity you need to be able to handle your database at peak loads has always been an issue for database management. With DynamoDB (as with their RDBMS product) you could configure your database to automatically scale up if usage suddenly increases. This may follow a standard pattern for certain times of the day, which may be predictable, but it could also be due to a sudden increase in activity for any reason on your database.

Of course, you could simply increase your provisioned capacity to handle high loads, however, as with other Amazon products, the amount you pay is tied to usage; you may not wish to pay for unused capacity. With auto-scaling, you can have your provisioned capacity expand to maintain speed without throttling, but it can also reduce the amount used in periods of low usage. By setting your configurations to use target tracking, you can manage the throughput through the use of secondary indexes to handle growth without any slowdown.

The diagram below can help demonstrate how the process works:

DynamoDB scaling
Source: How DynamoDB Auto Scaling Works

Serverless Web Applications

One of the most useful features of DynamoDB is that it is designed to integrate well with serverless Backend as a Service (BaaS) applications; meaning that it can be run entirely in the cloud without any requirements for on-premise hardware.  You can use AWS Lambda functions to configure it.

Possible Drawbacks

We’ve addressed many of the positive features regarding the ease of entry, the power and speed of this application, and its flexibility for working with web applications. However, there are a few possible drawbacks associated with using DynamoDB.

While, like other Amazon products you pay only as you go, the expense of running this can grow extremely fast, especially if you use a lot of tables and have a lot of users querying your data. While this is something that can be partially handled with correctly configured auto-scaling settings, it’s not hard to make a mistake and end up increasing your bottom line considerably.

As mentioned earlier, it is relatively easy to get started but the documentation is very dense. It is very easy to make some mistakes early on, so it’s a good idea to read and absorb the manuals before you start doing anything with any real scale, as this can have an impact on the bottom line as well.

One of the most important factors is to determine whether it’s right for your specific needs. For example, if you have a relatively small dataset and you need to be able to manage it very carefully, you are probably better off sticking with an RDBMS. Other important factors depend on whether you intend to do a lot of writing to your database; it’s a lot less expensive to do reads than writes, so that’s a factor to consider.

There are, of course, ways around this, but you do need to determine this before you end up slowing down your entire process. You are, however, prompted during the setup process and can determine the ratio of reads to writes and it will tell you the estimated cost of these changes. However, as stated before, read the manuals before you begin.

Getting Started

If you would like to try it out, there are few barriers to entry; all you need is an Amazon account, and follow the straightforward prompts provided in the AWS NoSQL services. Simply start, create a table and start entering data at the prompts.

Amazon DynamoDB

You can get started with DynamoDB here.

Also Read: WHAT IS WRONG WITH MVC

Leave a Reply

Your email address will not be published. Required fields are marked *


DATA ANALYTICS
October 20, 2022
How Big Data Analytics Services Can Reshape Your Business

SRS Document
November 15, 2022
How to Create Software Requirements Specification (SRS Document): A Guide