Skip to content

Other DynamoDB Design Patterns for Scaling

This is part 1 of a 3 part series

  1. Performance and scaling with partitions, indexes, and read/write capacity units
  2. Data modeling in DynamoDB
  3. Additional design patterns for DynamoDB

Other DynamoDB Design Patterns for Scaling

In addition to the content discussed in the prior two articles, there are also other concepts and design patterns that can also impact the scaling potential of a DynamoDB database. In this third and final article of the series, we will discuss some of those patterns, using a Scenario and Possible solution for each.

Scenario:
Having data in tables that has on average, a high volume of read requests, or sudden high periods of read bursts.  This is common for event based data, or product sales promotions, and other short duration time based data.

Possible Solution:
Use a separate caching layer, and some way to update the cache when needed. Using aws, two products that can assist with this are AWS ElasticCache, and AWS Lambda.

 

Scenario:
Having tables that have bulk data such as an order history table. This could typically have multiple attributes relating to the order, and then an attribute with a view of the order receipt for example.  It is not always necessary to view the receipt data in all cases.

Possible Solution:
Create a table to store the receipt data, with the order history id as the partition key.  Then create a separate GSI to store the rest of the data and include the receipt id attribute.  It is important to understand that the right implementation for your system needs is completely based off how your data is accessed, and other specific considerations.  The overall objective is to separate large items across multiple partitions based on utilization.

 

Scenario:
Many times it will be necessary to filter against attributes that are not part of a single partition or sort key.

Possible Solution:
a) Using a filter status, it is possible to pull only a subset of the original data returned from the query.  This will work, however it also incurs the cost of the first query which may product a large amount of results, only to dispose of them when applying the additional query filter.
b) Using a composite key, it is possible to first filter against the partition key, however then using the range key which has multiple fields combined, and the BEGINS_WITH syntax you can add additional criteria to exclude any items where the sort key does not start with a given string.  With this approach a smaller subset of data is returned initially from the datastore and uses fewer RCUs.

 

Scenario:
There may be times where you have many customers for example, but only a
small percentage of those will opt-in to a specific set of classifications, but you will still need to be able to filter on this, and don’t want to have to query the entire customer table.

Possible Solution:
Using a GSI with the classification as the partition key, along with a customer id attribute, and any other needed attributes, you can have a separate index of customers that can be queried by their type. This is known as a sparse index, which will require less capacity and can support full index scans.

Scenario:
At times you may run into a scenario where a table could potentially accumulate massive amounts of data in a short amount of time, and keep growing.  An example could be IoT metric data, where for each device, thousands and thousands of items are logged each day for each device.
Possible Solution:
Using logic to append a random key at the end of the partition key value (device id for example), you can split up the data into multiple sub partitions within the table to achieve sharded aggregation. If you appended _a,_b…._n (any randomized algorithm with a fixed set of values will work) to the end this data would automatically be sharded across a-n partitions.  To read the data out, you could then use parallel or concurrent queries to read from each unique partition key, to return all data and combine it back together.  The result would be much faster writes, but a little more complicated reads.  Again you need to decide if this is a suitable solution for your needs.

We hope you have enjoyed this 3 part series on DynamoDB, and have learned some valuable insights for your next DynamoDB project.

 

Written for ServerlessArchitecture.com
Written by Jeff Mangan

Leave a Reply