Thursday, June 15, 2017

AWS — When to use Amazon Aurora instead of DynamoDB

Amazon DynamoDB as managed database will work for you if you prefer code-first methodology. You will be able to easily scale it if your application inserts data and reads data by your hash key or primary key (hash+sort key). It is also good if your application is doing some queries on the data as long as the resultset of these queries returns less than 1Mb of data. Basically if you stick to functionality that is typically required by websites in real-time, then DynamoDB will perform for you. Obviously you will need to provision the reads and writes properly and you will need to implement some auto-scaling on DynamoDB WCUs and RCUs, but after you do all of the homework, it will be smooth for you without needing to manage much.
However, there are cases when you will need to go back to relational databases in order to accomplish your business requirements and technical requirements.
For example, let’s assume that your website calls one of your microservices which in turn inserts data into its table. Then let’s assume that you need to search the data in this table and perform big extracts which then have to be sent to a 3rd party that deals with your data in a batch-oriented way. If you need to for example query and extract 1 million records from your DynamoDB table, it will take you up to 4.7 hours based on my prototypes using standard AWS DynamoDB library from Python or C# application. The way you read this amount of data is by using LastEvaluatedKey within DynamoDB where you query/scan and get 1Mb (due to the cutoff) and then if the LastEvaluatedKey is not the end of resultset, you need to loop through and continue fetching more results until you exhaust the list. This is feasible but not fast and not scalable.
My test client was outside VPC and obviously if you run it within the VPC, you will almost double your performance, but it comes to bigger extracts, it still takes long. If you are dealing with less than 100,000 records, it is manageable within DynamoDB, but when you exceed 1 million records, it gets unreasonable.
So what do you do in this case? I am sure that you can improve the performance of the extract by using Data Pipeline and similar approaches that are more optimized, but you are still limited.
Basically, your solution would be to switch to a relational database where you can manage your querying much faster and you have a concept of transaction that helps with any concurrency issues you might have been challenged with. If you want to stay within the Amazon managed world, then Amazon Aurora looks very attractive. It has limitations on the amount of data, but most likely those limits are not low enough for your business. As for the big extract performance challenge, your extracts will go from hours (within DynamoDB) to minutes with Aurora.
Please consider this in your designs. Performing big extracts is opposite of the event driven architecture, but these type of requirements still exist due to a need to support legacy systems that you need to interact with or systems that have not adjusted their architecture to your methodologies.
Thank you for reading.
Almir Mustafic.




No comments:

Post a Comment