Selecting the right database in Amazon Web Service(AWS)

11 min readJan 18, 2021

Deciding on the database to use for our application or workload can be very tricky. Since I join AWS Community Builder, I spend at least 1 hour every day exploring AWS based on use cases. Amazon Web Service(AWS) provides several options for databases; we can be confused about the right one to choose. This article is a documentation of what I learned and the resources I used in understanding the various databases in AWS and how to decide when to use them. I hope it will be of value to you. I will like to have feedback on what you think I should add or remove or improve on as I continue exploring AWS and other cloud services.

There are a lot of criteria that could help us in selecting the right database in AWS. To make it easier, I summarize it into the following 4 criteria

Type of Data
Size of Data
Structure or Shape of Data
Activities that will be done on the Data

Now that we have an idea of the criteria we can use to select the right database, let us dive into each of these databases. All databases in AWS are known to have the following properties

fully managed by AWS
scalable; that is, increase and decrease based on demand
highly available; that is, the databases are guaranteed to be always up

Amazon Relational Database Service(RDS)

Amazon RDS is not a database itself but is used to set up, operate, and scale relational databases in the cloud. It enables us to provision Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. It is like an administrator for these databases. It automates failover, backups, restore, disaster recovery, access management, encryption, security, monitoring, and performance optimization. It has two major backup solutions which are automated backups and manual snapshots. It has a maximum of 5 replicas. Its replicas can be multi-availability zone replica, cross-region replicas, and read replicas. But the resources aren’t replicated across AWS Regions by default except you set it specifically.

When to use Amazon Relational Database Service(RDS)

If we need to use any of these six databases Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server.

Pricing

Amazon Relational Database Services(RDS) pricing depends on either we are using it On-Demand or Reserved Instances.

Official Resources

Other Resources

AWS Certified Solutions Architect — Associate 2020 (6:23:16–7:00:11)
Amazon Relational Database Service (Amazon RDS) by Vlad Vlasceanu
Amazon Relational Database Service (RDS) (DAT302) by Brian Welcker

Amazon Aurora

Amazon Aurora has MySQL and PostgreSQL compatibility; it is five times faster than standard MySQL and three times faster than standard PostgreSQL database. Amazon Aurora is 90% cheaper than standard MySQL and PostgreSQL databases. It has a maximum size of 128 TB. Amazon Aurora defines a scaling policy of a maximum of 15 Aurora Replicas. Aurora Backup and Failover are automatic. Amazon Aurora supports cross-region replication. Aurora MySQL DB Cluster and PostgreSQL are created using the Amazon Relation Database Service console. Aurora Serverless gives Amazon Aurora the ability to automatically scale up, scale down, start up, and shut down(auto-scaling).

When to use Amazon Aurora

Aurora Serverless is best used when you are building an application that is not frequently used, building a new application, building an application with varying and unpredictable workloads.

Pricing

Amazon Aurora Pricing is based on either we select the MySQL edition or the PostgreSQL edition. Aurora Serverless is charged based on Aurora Capacity Unit(ACU)

Official Resources

Other Resources

Amazon Aurora Introduction
AWS re:Invent 2019: [REPEAT] Amazon Aurora storage demystified: How it all works (DAT309-R) by Murali Brahmadesa and Tobias Ternström
Amazon Aurora Global Database Deep Dive by Aditya Samant
AWS Certified Solutions Architect — Associate 2020 (7:02:14–7:06:56)
Amazon Aurora ascendant: How we designed a cloud-native relational database

Amazon Redshift

Amazon Redshift is columnar storage used for data warehousing. It is used to analyse and quickly get insight from large data by executing complex queries on them. These data are usually at rest and historical data. It contains a cluster of nodes. It could be in single-node mode or multi-node mode. There are two types of nodes in Amazon Redshift, namely leader node and compute node. The leader node stores SQL endpoints, metadata and coordinates parallel SQL processing. Compute nodes stores the data, and execute the queries. Amazon Redshift stores data on a single Availability Zone. Amazon Redshift spectrum is used to query Amazon Simple Storage Service(Amazon S3) directly. Amazon Redshift federated queries enable us to query and analyze live data across databases, data warehouses, and data lakes.

When to use Amazon Redshift

for Online Analytical Processing
if we need to run queries across multiple data sources. For instance, we can copy data from different storages like Amazon EMR and Amazon S3 into Amazon Redshift.
Amazon Redshift is suitable for generating reports for business intelligence

Pricing

Amazon Redshiftp pricing — the basic price for Amazon Redshift starts from $0.25 per hour. There are several other features that can influence the price such as Amazon Redshift Spectrum pricing, Concurrency Scaling pricing, Redshift managed pricing, and Redshift ML pricing.

Official Resources

Other Resources

DynamoDB

DynamoDB is a NoSQL database, key/value, and document database. That is, it support document and key/value structures. DynamoDB’s major components are tables, items, attributes, keys, and values. A table is a collection of items, and an item is a collection of attributes. Items are similar to rows, while attributes are similar to columns in a traditional database. A key is used to identify attributes, and value is the data itself. The Major API components in DynamoDB are control plane, data plane, DynamoDB streams, and transactions. On-Demand and Provisioned Mode are the read/write capacity modes in DynamoDB. Amazon DynamoDB provides us with the ability to specify our Provisioned capacity based on Read Capacity Units(RCU) and Write Capacity Units(WCU). Amazon DynamoDB creates partitions based on size, Read Capacity Units and Write Capacity Units. The criteria required for partitioning are size of 10GB, RCU of 3000, and WCU of 1000. Encrypt data at rest (inactive data), data that is not moving from one device to another or from one network to another. DynamoDB has a Point in time recovery feature, that is, we can restore your data to any point in time. Amazon DynamoDB Accelerator(DAX) enables us to manage write-through cache on DynamoDB. It reduces response time from milliseconds to microseconds. Amazon DynamoDB uses SSD storage and stores its data across 3 different availability zone.

When to use Amazon DynamoDB

for Online Transaction Processing(OTP).
to store real-time data from an IoT device.
to store activities and events on a web application such as clicks.
to store items in a Web application like user profile, user events used by advertising, gaming, retail, finance, and media companies.
for Data that requires high request rate(millions of requests per seconds).
it is best used in situations that require high consistency.

Pricing

Amazon DynamoDB pricing depends on on-demand capacity mode and provisioned capacity mode.

Hands-On

Creating Tables and Loading Data

Sample Code

Create a ToDo Web App Storing your data in Amazon DynamoDB

Official Resources

Other Resources

Amazon DocumentDB

This is a document database that supports MongoDB. It has the capability to easily store, query, and index JSON data. It has about 15 read replicas and scales vertically with very low impact. It has flexible schema and Ad hoc query capability. It is easy to index and can be used for operational and analytics workloads.

Pricing

Amazon DocumentDB pricing is based on On-demand instances, database I/O, database storage and backup storage.

When to use Amazon DocumentDB

it is Amazon version of MongoDB, it is used when you need to run MongoDB at scale
best for Profile management, Content, and catalog management.

Official Resources

Other Resources

AWS re:Invent 2019: Amazon DocumentDB deep dive (DAT326) by Joseph Idziorek and Antra Grover
Building with Amazon DocumentDB (with MongoDB compatibility) — AWS Virtual Workshop by Meet Bhagdev

DynamoDB vs AWS DocumentDB vs MongoDB

Amazon Neptune

Amazon Neptune is a graph database, it works with highly connected datasets. It checks for relations or similarities in data. For instance, the similarity between the movies a user watches on Netflix. Its components are node(data entities), edges(relationships) and properties. Amazon Neptune support property graph like open-source Apache TinkerPop Germlin and Resource Description Framework(RDF) SPARQL. Amazon Neptune replicates data 6 times across 3 Availability Zones. Amazon Neptune Streams can be used to capture changes in a graph.

Pricing

Amazon Neptune pricing is influenced by On-demand instances, database I/O, database storage, backup storage and data transfer.

When to use Amazon Neptune

Amazon Neptune is best used when we have relationships in the data.
for recommendation engines, fraud detection, and drug discovery.
for knowledge base applications such as Wikidata.
for identity graphs to show unified view of customers and prospects based on their interactions with a product or a website.
for social Networking applications to store user profiles and interactions.

Official Resources

Other Resources

AWS re:Invent 2019: Deep dive on Amazon Neptune (DAT361) by Bradley Bebee, Karthik Bharathy, and Ora Lassila
Nike: A Social Graph at Scale with Amazon Neptune
Homesite: Event-Driven Data Analytics Platform Using Amazon Neptune
AWS on Air 2020: AWS What’s Next ft. Amazon Neptune ML

Amazon ElastiCache

Amazon ElastiCache use to manage in-memory caching. Caching is storing data in a temporary storage area. This data is stored on the RAM which is volatile, that is the data can get lost easily and can be accessed fast. It stores frequently accessed data to improve performance, this helps to avoid application latency and throughput. It caches data from the database which is different from CloudFront(Content Delivery Network). Amazon ElastiCache stores important data in memory. Amazon Cloudfront stores static files, for example, HTML, audio, video, media files required by a web app. Amazon ElastiCache accesses only resources in the same VPC.

source

Amazon ElastiCache has two engines

Amazon ElastiCache for Redis
Amazon ElastiCache for Memcached.

Pricing

Amazon ElastiCache pricing is based on the node types, backup storage, and data transfer.

When to use Amazon ElastiCache

it is best used when you need microseconds latency, key-based queries, and specialized data structures.
for situations like leader boards and real-time caching
if the data is on every page load or every request.

Official Resources

Other Resources

Amazon Timestream

Amazon Timestream is a serverless time-series database for IoT and operational applications. Time series data are recorded over a period of time such as stock data and temperatures of a device. Amazon Timestream can be used to store and analyze trillions of events per day up to 1,000 times faster and at as little as 1/10th the cost of relational databases. One major advantage of Amazon Timestream database is its capability to move historical data to low-cost storage(magnetic store) but retain recent data(hot data) in-memory(SSD store). Queries can be run on both historical data and recent data. In addition, Amazon Timestream has a built-in time-series analytics function such as smoothing, approximation, and interpolation which helps in detecting patterns in data. Major concepts on Amazon Timestream are record, dimension, measure, timestamp, table, and Database. Records cannot be deleted or updated.

Pricing

Amazon Timestream pricing is based on writes, SSD store, magnetic store, data transfer and queries.

When to use Amazon Timestream

for time series data from IoT devices
collecting and analysing operational metrics
analytical application

Sample code

Getting started with Amazon Timestream with Python

Official Resources

Other Resources

Getting Started with Amazon Timestream by Tony Gibbs
Deep Dive on Amazon Timestream by Tony Gibbs

Amazon Quantum Ledger Database(QLDB)

Amazon QLDB is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority. It is used to track data changes in applications. It can be used for storing audit logs. It uses a SQL like language called PartiQL. It is immutable, transparent, cryptographically verifiable, and serverless.

Pricing

Amazon Quantum Ledger Database(QLDB) pricing is based on the data transfer, data storage and I/O.

When to use Amazon QLDB

best suited for economic and financial record
for application data
used in finance for tracking credit and debit transactions
for HR systems to track employee payroll, bonus, benefits and other details
for manufacturing to track product manufacturing history

Official Resources

Other Resources

AWS re:Invent 2019: Amazon QLDB: An engineer’s deep dive on why this is a game changer (DAT380) by Andrew Certain
Building System of Record Applications with Amazon QLDB — AWS Online Tech Talks by Philip Simko and Michael Labib

Other resources on selecting the right database

AWS re:Invent 2017: [REPEAT] Which Database to Use When? (DAT310-R) by Tony Petrossian and Ian Meyers
Selecting the Right Database for Your Application by Joseph Idziorek
Choosing The Right Database by Randall Hunt

Whoa, so many databases and terminologies. I am sure you need a break. I hope you understood the different databases in AWS, when to use them, and links resources that will give you a deep dive.

Originally published at https://trojrobert.github.io on January 18, 2021.

Selecting the right database in Amazon Web Service(AWS)

Amazon Relational Database Service(RDS)

When to use Amazon Relational Database Service(RDS)

Pricing

Official Resources

Other Resources

Amazon Aurora

When to use Amazon Aurora

Pricing

Official Resources

Other Resources

Amazon Redshift

When to use Amazon Redshift

Pricing

Official Resources

Other Resources

DynamoDB

When to use Amazon DynamoDB

Pricing

Hands-On

Sample Code

Official Resources

Other Resources

Amazon DocumentDB

Pricing

When to use Amazon DocumentDB

Official Resources

Other Resources

DynamoDB vs AWS DocumentDB vs MongoDB

Amazon Neptune

Pricing

When to use Amazon Neptune

Official Resources

Other Resources

Amazon ElastiCache

Pricing

When to use Amazon ElastiCache

Official Resources

Other Resources

Amazon Timestream

Pricing

When to use Amazon Timestream

Sample code

Official Resources

Other Resources

Amazon Quantum Ledger Database(QLDB)

Pricing

When to use Amazon QLDB

Official Resources

Other Resources

Other resources on selecting the right database

Written by Robert John

No responses yet