AWS SQS with a single worker?












1















I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.



AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.



Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.










share|improve this question























  • In theory, SQS messages are consumed by just one consumer, isn't it?

    – Héctor
    Nov 20 '18 at 8:21











  • When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

    – hendry
    Nov 20 '18 at 8:37
















1















I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.



AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.



Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.










share|improve this question























  • In theory, SQS messages are consumed by just one consumer, isn't it?

    – Héctor
    Nov 20 '18 at 8:21











  • When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

    – hendry
    Nov 20 '18 at 8:37














1












1








1








I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.



AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.



Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.










share|improve this question














I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.



AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.



Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.







amazon-web-services aws-lambda amazon-sqs serverless






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 8:15









hendryhendry

3,932105183




3,932105183













  • In theory, SQS messages are consumed by just one consumer, isn't it?

    – Héctor
    Nov 20 '18 at 8:21











  • When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

    – hendry
    Nov 20 '18 at 8:37



















  • In theory, SQS messages are consumed by just one consumer, isn't it?

    – Héctor
    Nov 20 '18 at 8:21











  • When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

    – hendry
    Nov 20 '18 at 8:37

















In theory, SQS messages are consumed by just one consumer, isn't it?

– Héctor
Nov 20 '18 at 8:21





In theory, SQS messages are consumed by just one consumer, isn't it?

– Héctor
Nov 20 '18 at 8:21













When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

– hendry
Nov 20 '18 at 8:37





When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

– hendry
Nov 20 '18 at 8:37












2 Answers
2






active

oldest

votes


















1














Set the concurrency limit on the Lambda function to 1.






share|improve this answer
























  • Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

    – thomasmichaelwallace
    Nov 20 '18 at 15:22



















0














As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.



I have two suggestions for you, however:




  • If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

  • If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).






share|improve this answer
























  • Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

    – hendry
    Nov 20 '18 at 10:16











  • Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

    – hendry
    Nov 20 '18 at 10:21











  • That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

    – thomasmichaelwallace
    Nov 20 '18 at 10:23











  • As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

    – thomasmichaelwallace
    Nov 20 '18 at 10:26











  • There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

    – Mark B
    Nov 20 '18 at 15:10











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388753%2faws-sqs-with-a-single-worker%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Set the concurrency limit on the Lambda function to 1.






share|improve this answer
























  • Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

    – thomasmichaelwallace
    Nov 20 '18 at 15:22
















1














Set the concurrency limit on the Lambda function to 1.






share|improve this answer
























  • Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

    – thomasmichaelwallace
    Nov 20 '18 at 15:22














1












1








1







Set the concurrency limit on the Lambda function to 1.






share|improve this answer













Set the concurrency limit on the Lambda function to 1.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 20 '18 at 14:23









Mark BMark B

102k16162175




102k16162175













  • Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

    – thomasmichaelwallace
    Nov 20 '18 at 15:22



















  • Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

    – thomasmichaelwallace
    Nov 20 '18 at 15:22

















Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22





Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22













0














As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.



I have two suggestions for you, however:




  • If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

  • If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).






share|improve this answer
























  • Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

    – hendry
    Nov 20 '18 at 10:16











  • Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

    – hendry
    Nov 20 '18 at 10:21











  • That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

    – thomasmichaelwallace
    Nov 20 '18 at 10:23











  • As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

    – thomasmichaelwallace
    Nov 20 '18 at 10:26











  • There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

    – Mark B
    Nov 20 '18 at 15:10
















0














As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.



I have two suggestions for you, however:




  • If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

  • If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).






share|improve this answer
























  • Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

    – hendry
    Nov 20 '18 at 10:16











  • Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

    – hendry
    Nov 20 '18 at 10:21











  • That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

    – thomasmichaelwallace
    Nov 20 '18 at 10:23











  • As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

    – thomasmichaelwallace
    Nov 20 '18 at 10:26











  • There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

    – Mark B
    Nov 20 '18 at 15:10














0












0








0







As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.



I have two suggestions for you, however:




  • If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

  • If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).






share|improve this answer













As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.



I have two suggestions for you, however:




  • If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

  • If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 20 '18 at 9:40









thomasmichaelwallacethomasmichaelwallace

2,6701917




2,6701917













  • Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

    – hendry
    Nov 20 '18 at 10:16











  • Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

    – hendry
    Nov 20 '18 at 10:21











  • That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

    – thomasmichaelwallace
    Nov 20 '18 at 10:23











  • As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

    – thomasmichaelwallace
    Nov 20 '18 at 10:26











  • There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

    – Mark B
    Nov 20 '18 at 15:10



















  • Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

    – hendry
    Nov 20 '18 at 10:16











  • Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

    – hendry
    Nov 20 '18 at 10:21











  • That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

    – thomasmichaelwallace
    Nov 20 '18 at 10:23











  • As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

    – thomasmichaelwallace
    Nov 20 '18 at 10:26











  • There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

    – Mark B
    Nov 20 '18 at 15:10

















Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16





Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16













Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21





Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21













That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23





That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23













As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26





As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26













There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10





There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388753%2faws-sqs-with-a-single-worker%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Hercules Kyvelos

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud