Friday, June 10, 2016

AWS DynamoDB and limitations with GSIs (Global Secondary Indexes)

How of many of you have challenges with DynamoDB’s limit for the number of GSIs (Global Secondary Indexes)?
Let’s step back. You may be coming into the world of NoSQL relatively recently and things that we took for granted before need to be carefully analyzed now. It is not any more about how the data is stored, but it is rather about how you search that data. AWS DynamoDB has a limit of 5 global search indexes per table.
Let’s assume you have a customer-profile table in DynamoDB with the following attributes:
  • customerId (hashkey)
  • firstName
  • lastName
  • middleName
  • city
  • county
  • zipCode
  • streetName
  • streetNumber
Let’s assume you set up the following five GSIs:
  • firstName
  • lastName
  • city
  • county
  • zipCode
Let’s assume that this setup has been working for you and then you get a requirement to have ability to search by streetName. For sake of argument, let’s say your product will have this cool new feature that allows you to search other people who live on the same streetName. You don’t have any more GSIs available. What do you do?
There are three solutions that you can apply.
(1) You need to re-think how you set up your table and your indexesthat I summarized above. Instead of having an index on BOTH firstName and lastName, you can have an index for lastName only. This will work for you if you typically search on lastName and firstName together to find a specific customer. So you can end up doing a query with filter expressionson this firstName-index table where lastName is now the partition key. The filter expression would be on the firstName attribute.
(2) Stream data to RedShift and perform search on RedShift:
You can stream your table into Redshift and on the RedShift side you would define the relational structure. Once the data is in RedShift, you can perform SQL queries. The only downside with this is the latency introduced to stream data from DynamoDB into RedShift.
(3) Create a secondary table to hold name/value pairs but there is a catch:
You could create a secondary or child table to hold a lot of information that you would search on and this table would basically have a parentId from the master table and Name, Value pair generic attributes. So for every record in the master customer-profile table, you will have X number of items (rows) in the secondary table because all other values are stored vertically in this secondary table. This means that for every write into the main customer-profile table, you have to do X more writes in the secondary table. DynamoDB should be able to handle this and the search should be efficient, but the COST will increase due to increased number of reads/writes. As long as you can afford that extra cost, and this adds a lot of business value, then you are on the right path.
I hope this gives you some ideas. You can follow me here (almirsCorner.com) and I also blog on Medium.com/@almirx101 
Almir M.

3 comments: