Skip to content

AWS DynamoDB Cheat Sheet

As the lead architect for a project that involves building a fairly large Ecommerce application, with the entire backend running in an “AWS Serverless” infrastructure, I have been using AWS DynamoDB quite a bit lately. As a result of this, I am always finding myself having to lookup certain things over and over again when determining how to model tables, indexes, streams, etc…. I am a big fan of short and simple reference A.K.A “Cheat” sheets, and therefore, I decided to take a free evening and compile a list of what I feel are many of the important aspects needed when working with DynamoDB. So, Here is my list. Does it include everything, of course not, and it may become obsolete at some point given the frequency of how often AWS updates their product line.

As the lead architect for a project that involves building a fairly large Ecommerce application, with the entire backend running in an “AWS Serverless” infrastructure, I have been using AWS DynamoDB quite a bit lately.  As a result of this, I am always finding myself having to lookup certain things over and over again when determining how to model tables, indexes, streams, etc….  I am a big fan of short and simple reference A.K.A “Cheat” sheets, and therefore, I decided to take a free evening and compile a list of what I feel are many of the important aspects needed when working with DynamoDB.  So, Here is my list. Does it include everything, of course not, and it may become obsolete at some point given the frequency of how often AWS updates their product line.  However, I will try to keep this updated as things change, and if you choose to refer to it, I would strongly suggest linking to it so you always view the latest content.

 

Tables

  • Data is stored across partitions, and partitions are stored on multiple servers.
  • Spread data across partitions evenly to optimize read requests
  • ListTables operations can only return 100 tables per response, so use pagination if more than 100 exists.
  • DescribeTable is used to return the structure of a table

Items

  • Must be <= 400kb each, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length). The attribute name counts towards the size limit.
  • 1 RCU is used for a consistent read even if the item is smaller than 4kb
  • Optimize RCU by storing short attribute names
  • Reduce item size (RCU) by storing infrequently used attributes in a separate table to reduce the returned payload

Attributes

  • A Null or Bool attribute is the length of the name + 1
  • List or Map requires 3 bytes of overhead in all cases
  • Types (here for more details)
    • String
    • Binary
    • Number
    • StringSet — Array of strings
    • NumberSet — Array of numbers
    • BinarySet — Array of blobs
    • Map — Unordered collection of name/value pairs accessible by pair name.
    • List — array of attribute values, of which can be different types (complex data storage), accessed by position index, not name.
    • Boolean
    • Null — Nullable boolean

Primary Keys

  • Partition Key
    • When used alone, it must unique across all items in the table. A hash of the key is generated from an internal hash function, which is then used to determine what partition is used to store the item on.  It is also known as the “Hash Key”
    • Supports data type of String, Binary, or Number
  • Partition Key and Sort Key
    • When a sort key is used along with the partition key, all items are stored in the same partition and sorted by the sort key.  It is also known as the “Range Key” since items are stored in sorted order by this key, within a partition.
    • Supports data type of String, Binary, or Number
    • The combination of the Partition Key and Sort Key must be unique accross all items in the table (partition)

Capacity Units

  • General
    • Use the DescribeTable operation to see all provisions throughput settings for all indexes of a table
    • Capacity units are based off 4kb per read and 1kb per write
    • Use a monitoring tool such as CloudWatch and set alerts to notify when a certain level has been reached so that you can adjust as needed before running into performance problems.
  • Read
    • It is important to predetermine data usage patterns, how many reads per second and how large of data to be returned. This helps determine your RCU needs.
    • RCU used in a read request is the size of all attributed and items returned from a query and rounded to the next 4kb boundary.
    • Reads for nonexistent items still use 1RCU
    • RCU number of 4k reads that are strongly consistent. So 1 RCU equivalates to 1 4k read per second.  For example a request of 5 RCU would allow 5 consistent reads each of 4kb, per second.  For eventual consistent reads, a RCU is double at 2 per second instead of 1
    • Query response can not be larger than 1 megabyte in size including all items and attributes returned.
    • GetBachItem gets each item separately, so each item is rounded up separately and then returned.
    • Query results are returned as one read operations and only rounded once after all items are summed up.
    • Scan is calculated off the items scanned, not returned so be carefule how you use a Scan operation.  Only 1mb can be returned at most.
    • Example of determining provisioning needs:
Expected Consistency Reads / Second Required
8kb Consistent 100 100
16kb Consistent 100 200
8kb Eventually Consistent 100 50
16kb Eventually Consistent 100 100

 

    • Exceeding provisions read capacity and a request will be throttled
    • When writing to a table and the GSI has insufficient write capacity, the write on the table will be throttled
  • Write
    • 1 WCU is a 1kb write per second, so a WCU request of 20 would equivalate to 20 1kb writes per second.
    • Conditional writes that evanluate to false, will still use 1 WCU
    • Items such as n.7kb are rounded to n+1 since writes are done in 1kb increments.
    • It is important to predetermine data usage patterns, how many writes per second and how large of payload. This helps determine your WCU needs.
    • Put with a duplicate partition key will update the existing item, and the WCU will be whichever item is the largest.j
    • Update will consume a WCU of whichever is the largest item, before or after the update.
    • Example of determining provisioning needs:
Expected Writes / Second Required
8kb 100 800
16kb 100 1600
1kb 100 100
2kb 100 200

 

Indexes

  • General
    • Store only the fields you need to improve performance and data consumption
    • Items smaller than 1kb will still use 1 entire write capacity unit when writing to the index adding more attributes up to 1kb will not cost any extra.
    • Sparse indexes can be beneficial in times where attributes do not appear in all items.  Add an extra field to mark the item and index on that field.  Remove the field when it no longer applies. For example InActive customer can have that field only when they are not active.
    • Item collections (all items and local indexes) cannot exceed 10gb. It is important to take this into consideration when modeling your partition key and index attributes.
    • The total item collection size can be reduced by splitting into multiple partitions, such as adding a random number 1-100 at the end, and then concurrently querying all partitions instead of just the one.
    • Key Specification
      • Keys_Only – Only table partition and sort key are included and creates the smallest possible index
      • Include – Specify additional non-key  fields you want to include, they will become part of the table as a result.
      • All – Every attribute is projected and creates the largest possible index.
  • Local
    • Must be created at time of time table creation, not after
    • Combination of partition key and range key must be unique (composite key)
    • Attributes not projected in an LSI can be retrieved also, dynamodb will do a table fetch as part of the query and read the entire item, resulting in latency, I/O operations,  and a higher throughput cost.
    • Updates are synchronous as part of the put/delete/update
    • Read and write capacity units are shared with the table
  • Global
    • Can be created after table has been created
    • Updates are asynchronous (eventually consistent) as part of the put/delete/update
    • Eventually consistent reads consume ½ of a RCU 2x4kb=8kb
    • Uses it’s own read and write capacity units and not shared with the table
    • Partition key can be different from the table partition key since GSI are stored in a completely separate from the table partition.
    • Put/Delete/Update to a table can also use WCU on a GSI as well
    • Combination of partition key and range key do not need to be unique
    • Can only return attributes inside of the index, can not retrieve from parent table
    • Space consumed by a global secondary indexed item is the sum of:
      • Byte size of the key (partition and sort )
      • Byte size of the index key attribute
      • Byte size of the projected attributes (if any)
      • 100 bytes of overhead per index item

Written for ServerlessArchitecture.com
Written by Jeff Mangan

3 thoughts on “AWS DynamoDB Cheat Sheet

  1. John Bledsoe says:

    Thanks so much for putting this together. It brought out a couple of points that I didn’t see when evaluating DynamoDB.

    One correction though, it appears that the item size limit is 400Kb, not 4Kb

    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-items

    Item Size

    The maximum item size in DynamoDB is 400 KB, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length). The attribute name counts towards the size limit.

  2. Pingback: AWS DynamoDB, Redundancy, Eventual Consistency, and Lambda together at last. | Serverless Architecture

Leave a Reply