How To Put DynamoDB Put Requests In A Queue

I have seen a lot of people mentioning, that a way to deal with limited WCUs in DynamoDB is to send your requests in a Queue and let the queue insert the data into dynamodb in a way that will not go over your allocated WCUs.

Does anyone have any examples of this? I am currently working with aws lambda in python and nodeJS.

What I understand:

If lambda wants to put 2,000 items into dynamodb and we only have 100 WCUs, instead of having lambda retry and or waiting between each requests.

We can send the items to a SQS queue which will then input the items at a rate of 100WCUs per second.

Is this the right workflow?

Solution:

You might use Redis. Especially if you are inserting into the same partition over and over again (called ‘hot partition’) Redis provides random selection from a set. Simply you queue (insert) them into Redis and read from another process constantly and insert into db. On Nodejs and Python redis has thousands of example and pretty handy to use 🙂

There is an AWS Blog entry on caching. Have a look at it. Also recommends this pattern.

Although the number of use cases for caching is growing, especially
for Redis, the predominant workload is to support read-heavy workloads
for repeat reads. Additionally, developers also use Redis to better
absorb spikes in writes. One of the more popular patterns is to write
directly to Redis and then asynchronously invoke a separate workflow
to de-stage the data to a separate data store (for example, DynamoDB).

There are probably more ways of achieving what you want maybe even in a simpler way but Redis goes with literally everything maybe you are even using it already 😉

Can SQS scale up to 1,000,000 queues for a single account?

I need a messaging service that allows me to create a channel for each user in order to facilitate real-time notifications. If I have somewhere between 100,000 and 1 million users, does it make sense to create an SQS queue for each of these users?

According to the SQS pricing documentation it would only cost $0.40 to create 1 million queues, but would I run into scaling problems?

Also, is there a way to set an expiration date on a queue? If a user deletes their account, then their queue no longer needs to exist.

Solution:

Creating queues is not an issue here. Polling or even long polling the queue is going to be really expensive for you. In order to process real-time notifications, you need to poll every queue, 1M of them for lets say every 5 seconds.

Based on SQS Pricing, Price per 1 Million Requests after free tier is $0.00000040 per request.

That means you will be calling the ReceiveMessage API for about:

1000000 queues * 17280 (1 day in seconds / 5 seconds) = 17280000000 times.

Which is about $6912.00 per day for the worst case scenarios.

You need to architect the solution in a better way.

How to found supported versions for Amazon SQS?

How to found latest supported version for an AWS service? Not latest.

For example for Amazon SQS?

Solution:

The current API version for SQS is 2012-11-05, as noted at the top of each page of the SQS API Reference.

Most services list their current API Version this way — at the top of each page in the API Reference for that service.

The AWS service APIs are usually very stable, so AWS doesn’t always bump the version when enhancements come out. That means the date 2012-11-05 for SQS doesn’t imply that the API is completely unchanged for 5+ years. Instead, it means that no breaking changes have occurred to the API, and libraries written against any iteration of the 2012-11-05 SQS API will continue to work for all the features that particular library implements, going forward.

There is essentially never a need to specify an older version, nor a need to change the version you reference in any particular project of yours to a newer API version, unless you are trying to use a new feature that is only a available via the newer API release… which generally means your supporting libraries/SDK would need upgrades as well… so once you configure this, there’s not often a need to change it.

How to get consistent CPU utilization on AWS

I’ve now learnt that when I start a new EC2 instance it has a certain number of CPU credits due to which it’s performance is high when it starts processing but gradually reduces over time as the credits run out. Past that point, the instance runs at which appears to be the baseline CPU utilisation rate. To numerate, when I started the EC2 instance (t2.nano), Cloudwatch reported around 80% CPU utilisation gradually decreasing down to 5%.

Now I’m happy to use one of the better instance types pending the instance limit request. But whilst that is in progress, I’d like to know whether the issue of reducing performance over time will still hold even with the better instance type?

Would I require a dedicated host setup if I wish to ensure I get consistent CPU utilisation? The only problem I can see here is that I’m running a SQS worker queue and Elastic Beanstalk allows us to easily setup a worker environment which reads messages from the queue. From what I’ve read and from looking at the configuration options available in Elastic Beanstalk, I don’t think I’ll be able to launch instances into a dedicated host directly. Most of my reading has lead me to believe that I’ll have to learn how to use a VPC. Would that be correct?

So I guess my questions are – would simply increasing the instance type to a more powerful instance guarantee consistent CPU utilisation performance or is a dedicated host required and if so, is it possible to set up one with Elastic Beanstalk or would it have to be setup manually and if it is set up manually can it be configured to work with an SQS queue automatically?

Solution:

If you want consistent CPU performance, you should avoid the burstable performance instances (the T2 family). All other families of instances (M5, C5, etc) will have consistent CPU performance over time. You can use any instance family with Elastic Beanstalk. No need for a dedicated host.

Are AWS SQS queues isolated from each other?

Say that I have two separate services, A and B, with SQS queues that are both subscribed to SNS topic “topic-foo”. Then I publish a message m1 to the SNS topic “topic-foo”.

If the SQS queue owned by service A (sqs-A) sees message m1 and processes it (i.e. pops it off the queue and processes the message so that it’s no longer on sqs-A), will I still be guaranteed that the separate SQS queue owned by service B (sqs-B) will always be able to see and process message m1? (in other words, does AWS SNS publishing guarantee multiple delivery to SQS queues and isolation of separate SQS queue processing?)

Solution:

In your situation, you have two SQS queues, each one is subscribed to an SNS topic.

In this case, when you send a message to the SNS topic, an item is added to each of the SQS queues. The two queues are distinct and independent, so processing the item in one queue will not affect the item in the other queue.

This has nothing to do with SNS and is purely because your two SQS queues are two separate SQS queues. The fact that SNS is publishing to them doesn’t change how the queues behave.