Skip to content

Data modeling in DynamoDB

This is part 2 of a 3 part series

  1. Performance and scaling with partitions, indexes, and read/write capacity units
  2. Data modeling in DynamoDB
  3. Additional design patterns for DynamoDB

 

NoSQL data modeling is a concept that is fairly new when compared to doing this with more established RDBMS that have been around for much longer.  NoSQL databases are great for many reasons, however don’t interpret this as claiming there is always a place for them.  The CAP theorem provides a base starting point in understanding some of the nun-functional criteria to consider when choosing between a NoSQL database to a traditional RDBMS system.  In addition to this, there are also plenty other of considerations as well that involve  many aspects around how the data will be used and accessed.  We will not go into these details in this article, however what we will cover is typical data modeling scenarios once you have chosen to implement a DyamoDB database.

A key factor of DyamoDB (and NoSQL in general) is that there is no place for the age old table joins we were are used to in our RDBMS. DynamoDB uses a completely different approach and therefore required completely different implementations of how the data is stored and accessed.  Below you will find a list of a few data modeling scenarios and their possible solutions.  Note the word “possible”, because there are usually always many ways to implement a solution for the same problem, and “your mileage may vary”.

 

One to One Relationship Example
Scenario:
When you have 2 unique attributes in the same table, and want to be able to filter on both of these separately.  Say you are a grocery store and have a product that has both a product SKU and an ID from the manufacturer, where both of these attributes are unique and you want to be able to query on both of them.

Possible Solution:
Create a table with SKU as the partition key, and then use a GSI with the ID attribute as its partition key.  Both of these would then have the remaining attributes respectively. Doing this would allow you to query the data by either of these 2 attributes.  If you are concerned about data usage and cost, you may also consider having only the 2 unique attributes in the Table or GSI, which would then require you to query both to get the full data set, but reduce the overall amount of data being stored. See the first article on “Covered Indexes” for more information on this. Just be aware that this has the potential to conflict with one of the main purposes of a NoSQL datastore, which is reducing overhead and complexity through duplication of data.

One to Many Relationship Example
Scenario:
When you have 1 unique attribute in the same table, and multiple items share a common attribute as well (parent id for example), and want to be able to filter on both of these separately.  Using the grocery store example,  and have a product that has many ingredients.

Possible Solution:
Create an ingredient table with the product id as the partition key, and then a sort key of the ingredient id.  use a GSI with the ID field as it’s partition key.  You would then add the rest of the attributes to the table as standard table attributed.  The result would be a table partitioned by product id and can be queries on both product id and ingredient id.

 

Many to Many Relationship Example
Scenario:
Continuing with the above examples, and adding a new table called Customer, consider that a multiple customers can purchase multiple of the same products and that the grocery store wants to be able to determine what customers bought what products and also what products were sold to what customers.

Possible Solution:
There would obviously be a customer table that has a partition key of customer id more than likely. You could also add a sort key such as where they are located, but we don’t get into that for this example.  If you then created a new table called CustomerProducts, using the customer id as the partition key and the product id as the sort key.  Next you could then create a GSI called ProductCustomers using the product id as the partition key and the customer id as the sort key.   Doing so would support both queries, one to filter on customer and one to filter on product.

 

There are other scenarios in addition the 3 listed above,  we’ll try to write about those in future post.

Written for ServerlessArchitecture.com
Written by Jeff Mangan

Leave a Reply