AWS SQS with a single worker?

I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.

AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.

Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.

asked Nov 20 '18 at 8:15

hendry

3,932105183

In theory, SQS messages are consumed by just one consumer, isn't it?

– Héctor
Nov 20 '18 at 8:21

When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

– hendry
Nov 20 '18 at 8:37

add a comment |

I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.

asked Nov 20 '18 at 8:15

hendry

3,932105183

In theory, SQS messages are consumed by just one consumer, isn't it?

– Héctor
Nov 20 '18 at 8:21

When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

– hendry
Nov 20 '18 at 8:37

add a comment |

I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.

asked Nov 20 '18 at 8:15

hendry

3,932105183

I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.

amazon-web-services aws-lambda amazon-sqs serverless

asked Nov 20 '18 at 8:15

hendry

3,932105183

asked Nov 20 '18 at 8:15

hendry

3,932105183

asked Nov 20 '18 at 8:15

hendry

3,932105183

asked Nov 20 '18 at 8:15

hendry

3,932105183

asked Nov 20 '18 at 8:15

hendry

3,932105183

In theory, SQS messages are consumed by just one consumer, isn't it?

– Héctor
Nov 20 '18 at 8:21

When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

– hendry
Nov 20 '18 at 8:37

add a comment |

In theory, SQS messages are consumed by just one consumer, isn't it?

– Héctor
Nov 20 '18 at 8:21

When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

– hendry
Nov 20 '18 at 8:37

In theory, SQS messages are consumed by just one consumer, isn't it?

– Héctor
Nov 20 '18 at 8:21

When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.

– hendry
Nov 20 '18 at 8:37

add a comment |

2 Answers
2

active

oldest

votes

Set the concurrency limit on the Lambda function to 1.

answered Nov 20 '18 at 14:23

Mark B

102k16162175

Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22

add a comment |

As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.

I have two suggestions for you, however:

If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16

Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21

That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23

As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26

There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388753%2faws-sqs-with-a-single-worker%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Set the concurrency limit on the Lambda function to 1.

answered Nov 20 '18 at 14:23

Mark B

102k16162175

Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22

add a comment |

Set the concurrency limit on the Lambda function to 1.

answered Nov 20 '18 at 14:23

Mark B

102k16162175

Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22

add a comment |

Set the concurrency limit on the Lambda function to 1.

answered Nov 20 '18 at 14:23

Mark B

102k16162175

Set the concurrency limit on the Lambda function to 1.

answered Nov 20 '18 at 14:23

Mark B

102k16162175

answered Nov 20 '18 at 14:23

Mark B

102k16162175

answered Nov 20 '18 at 14:23

Mark B

102k16162175

answered Nov 20 '18 at 14:23

Mark B

102k16162175

Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22

add a comment |

Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22

Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)

– thomasmichaelwallace
Nov 20 '18 at 15:22

add a comment |

As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.

I have two suggestions for you, however:

If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16

Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21

That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23

As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26

There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10

add a comment |

As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.

I have two suggestions for you, however:

If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16

Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21

That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23

As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26

There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10

add a comment |

As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.

I have two suggestions for you, however:

If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.

I have two suggestions for you, however:

If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.

If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

answered Nov 20 '18 at 9:40

thomasmichaelwallace

2,6701917

Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16

Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21

That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23

As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26

There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10

add a comment |

Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16

Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21

That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23

As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26

There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10

Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?

– hendry
Nov 20 '18 at 10:16

Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.

– hendry
Nov 20 '18 at 10:21

That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.

– thomasmichaelwallace
Nov 20 '18 at 10:23

As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.

– thomasmichaelwallace
Nov 20 '18 at 10:26

There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to 1 in the Lambda function's settings.

– Mark B
Nov 20 '18 at 15:10

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk