EventHub ForEach Parallel Async











up vote
0
down vote

favorite












Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..



given the following trivial example:



// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };


i can post each event to an eventhub simply as follows:



foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}


the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??



Changing ICollector to IAsyncCollector, and achieve the following:



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..



Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:



 Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});


Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.



Finally, i could turn them all in to Tasks i thought..



 private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}

var t = new List<Task>();

foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}

await Task.WhenAll(t);


now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??



I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..










share|improve this question


















  • 1




    Kind of off-topic but, assuming we're talking azure event hub here may I suggest you bundle event and send events in a batch
    – Peter Bons
    Nov 7 at 18:57










  • it's not that much off topic @PeterBons, but yeah that is a good idea and something we don't really do enough of tbh.. i will certainly look in to it :)
    – m1nkeh
    Nov 8 at 12:13















up vote
0
down vote

favorite












Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..



given the following trivial example:



// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };


i can post each event to an eventhub simply as follows:



foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}


the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??



Changing ICollector to IAsyncCollector, and achieve the following:



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..



Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:



 Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});


Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.



Finally, i could turn them all in to Tasks i thought..



 private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}

var t = new List<Task>();

foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}

await Task.WhenAll(t);


now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??



I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..










share|improve this question


















  • 1




    Kind of off-topic but, assuming we're talking azure event hub here may I suggest you bundle event and send events in a batch
    – Peter Bons
    Nov 7 at 18:57










  • it's not that much off topic @PeterBons, but yeah that is a good idea and something we don't really do enough of tbh.. i will certainly look in to it :)
    – m1nkeh
    Nov 8 at 12:13













up vote
0
down vote

favorite









up vote
0
down vote

favorite











Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..



given the following trivial example:



// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };


i can post each event to an eventhub simply as follows:



foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}


the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??



Changing ICollector to IAsyncCollector, and achieve the following:



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..



Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:



 Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});


Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.



Finally, i could turn them all in to Tasks i thought..



 private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}

var t = new List<Task>();

foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}

await Task.WhenAll(t);


now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??



I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..










share|improve this question













Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..



given the following trivial example:



// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };


i can post each event to an eventhub simply as follows:



foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}


the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??



Changing ICollector to IAsyncCollector, and achieve the following:



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..



Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:



 Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});


Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.



Finally, i could turn them all in to Tasks i thought..



 private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}

var t = new List<Task>();

foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}

await Task.WhenAll(t);


now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??



I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..







c# multithreading async-await






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 7 at 18:41









m1nkeh

412315




412315








  • 1




    Kind of off-topic but, assuming we're talking azure event hub here may I suggest you bundle event and send events in a batch
    – Peter Bons
    Nov 7 at 18:57










  • it's not that much off topic @PeterBons, but yeah that is a good idea and something we don't really do enough of tbh.. i will certainly look in to it :)
    – m1nkeh
    Nov 8 at 12:13














  • 1




    Kind of off-topic but, assuming we're talking azure event hub here may I suggest you bundle event and send events in a batch
    – Peter Bons
    Nov 7 at 18:57










  • it's not that much off topic @PeterBons, but yeah that is a good idea and something we don't really do enough of tbh.. i will certainly look in to it :)
    – m1nkeh
    Nov 8 at 12:13








1




1




Kind of off-topic but, assuming we're talking azure event hub here may I suggest you bundle event and send events in a batch
– Peter Bons
Nov 7 at 18:57




Kind of off-topic but, assuming we're talking azure event hub here may I suggest you bundle event and send events in a batch
– Peter Bons
Nov 7 at 18:57












it's not that much off topic @PeterBons, but yeah that is a good idea and something we don't really do enough of tbh.. i will certainly look in to it :)
– m1nkeh
Nov 8 at 12:13




it's not that much off topic @PeterBons, but yeah that is a good idea and something we don't really do enough of tbh.. i will certainly look in to it :)
– m1nkeh
Nov 8 at 12:13












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










Parallel != async



This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:



Simple foreach



This is non-parallel and non-async. Nothing to talk about.



Await inside foreach



This is async code that is non-parallel.



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.



Parallel Foreach (async)



This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.



Parallel foreach (sync)



A.k.a. Parallel but not async.



Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});


Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.



You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...



Tasks



Async and potentially parallel (depends on the usage).



Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:



What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.



You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.



Back to Parallel.Foreach and async



At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.



I hope this cleared most things for you, if not, let me know what to improve on.






share|improve this answer





















  • that is an absolutely superb explanation.. there were a couple of things in there that i had to stop, and think about... but it really is clear, and SO much better than attempting to piece together bits of info from a myriad of documents online.. re: my 'await inside foreach' example to confirm that won't do multiple AddAsync() jobs concurrently... it will still do them in sequence? And that pattern would only become helpful when you have other code/activities that can be getting on with things in the same foreach iteration..? 👍
    – m1nkeh
    Nov 8 at 12:29












  • Thank you for your kind response.:) Regarding await inside foreach: (regular foreach, right?) No it will not. It will run them one by one. However that pattern is always useful, for IO bound tasks you should always use the Async method if you can. This way the CPU thread is freed while waiting for the network response. It can do any other task from the "global task list", it doesn't have to be related to your foreach. Most trivial example is in a GUI app it can update the GUI (so it doesn't freeze), in a web app it can process another request, and so on
    – Marcell Tóth
    Nov 8 at 22:27











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195776%2feventhub-foreach-parallel-async%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










Parallel != async



This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:



Simple foreach



This is non-parallel and non-async. Nothing to talk about.



Await inside foreach



This is async code that is non-parallel.



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.



Parallel Foreach (async)



This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.



Parallel foreach (sync)



A.k.a. Parallel but not async.



Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});


Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.



You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...



Tasks



Async and potentially parallel (depends on the usage).



Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:



What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.



You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.



Back to Parallel.Foreach and async



At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.



I hope this cleared most things for you, if not, let me know what to improve on.






share|improve this answer





















  • that is an absolutely superb explanation.. there were a couple of things in there that i had to stop, and think about... but it really is clear, and SO much better than attempting to piece together bits of info from a myriad of documents online.. re: my 'await inside foreach' example to confirm that won't do multiple AddAsync() jobs concurrently... it will still do them in sequence? And that pattern would only become helpful when you have other code/activities that can be getting on with things in the same foreach iteration..? 👍
    – m1nkeh
    Nov 8 at 12:29












  • Thank you for your kind response.:) Regarding await inside foreach: (regular foreach, right?) No it will not. It will run them one by one. However that pattern is always useful, for IO bound tasks you should always use the Async method if you can. This way the CPU thread is freed while waiting for the network response. It can do any other task from the "global task list", it doesn't have to be related to your foreach. Most trivial example is in a GUI app it can update the GUI (so it doesn't freeze), in a web app it can process another request, and so on
    – Marcell Tóth
    Nov 8 at 22:27















up vote
2
down vote



accepted










Parallel != async



This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:



Simple foreach



This is non-parallel and non-async. Nothing to talk about.



Await inside foreach



This is async code that is non-parallel.



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.



Parallel Foreach (async)



This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.



Parallel foreach (sync)



A.k.a. Parallel but not async.



Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});


Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.



You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...



Tasks



Async and potentially parallel (depends on the usage).



Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:



What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.



You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.



Back to Parallel.Foreach and async



At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.



I hope this cleared most things for you, if not, let me know what to improve on.






share|improve this answer





















  • that is an absolutely superb explanation.. there were a couple of things in there that i had to stop, and think about... but it really is clear, and SO much better than attempting to piece together bits of info from a myriad of documents online.. re: my 'await inside foreach' example to confirm that won't do multiple AddAsync() jobs concurrently... it will still do them in sequence? And that pattern would only become helpful when you have other code/activities that can be getting on with things in the same foreach iteration..? 👍
    – m1nkeh
    Nov 8 at 12:29












  • Thank you for your kind response.:) Regarding await inside foreach: (regular foreach, right?) No it will not. It will run them one by one. However that pattern is always useful, for IO bound tasks you should always use the Async method if you can. This way the CPU thread is freed while waiting for the network response. It can do any other task from the "global task list", it doesn't have to be related to your foreach. Most trivial example is in a GUI app it can update the GUI (so it doesn't freeze), in a web app it can process another request, and so on
    – Marcell Tóth
    Nov 8 at 22:27













up vote
2
down vote



accepted







up vote
2
down vote



accepted






Parallel != async



This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:



Simple foreach



This is non-parallel and non-async. Nothing to talk about.



Await inside foreach



This is async code that is non-parallel.



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.



Parallel Foreach (async)



This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.



Parallel foreach (sync)



A.k.a. Parallel but not async.



Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});


Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.



You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...



Tasks



Async and potentially parallel (depends on the usage).



Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:



What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.



You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.



Back to Parallel.Foreach and async



At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.



I hope this cleared most things for you, if not, let me know what to improve on.






share|improve this answer












Parallel != async



This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:



Simple foreach



This is non-parallel and non-async. Nothing to talk about.



Await inside foreach



This is async code that is non-parallel.



foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}


This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.



Parallel Foreach (async)



This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.



Parallel foreach (sync)



A.k.a. Parallel but not async.



Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});


Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.



You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...



Tasks



Async and potentially parallel (depends on the usage).



Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:



What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.



You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.



Back to Parallel.Foreach and async



At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.



I hope this cleared most things for you, if not, let me know what to improve on.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 7 at 19:22









Marcell Tóth

905217




905217












  • that is an absolutely superb explanation.. there were a couple of things in there that i had to stop, and think about... but it really is clear, and SO much better than attempting to piece together bits of info from a myriad of documents online.. re: my 'await inside foreach' example to confirm that won't do multiple AddAsync() jobs concurrently... it will still do them in sequence? And that pattern would only become helpful when you have other code/activities that can be getting on with things in the same foreach iteration..? 👍
    – m1nkeh
    Nov 8 at 12:29












  • Thank you for your kind response.:) Regarding await inside foreach: (regular foreach, right?) No it will not. It will run them one by one. However that pattern is always useful, for IO bound tasks you should always use the Async method if you can. This way the CPU thread is freed while waiting for the network response. It can do any other task from the "global task list", it doesn't have to be related to your foreach. Most trivial example is in a GUI app it can update the GUI (so it doesn't freeze), in a web app it can process another request, and so on
    – Marcell Tóth
    Nov 8 at 22:27


















  • that is an absolutely superb explanation.. there were a couple of things in there that i had to stop, and think about... but it really is clear, and SO much better than attempting to piece together bits of info from a myriad of documents online.. re: my 'await inside foreach' example to confirm that won't do multiple AddAsync() jobs concurrently... it will still do them in sequence? And that pattern would only become helpful when you have other code/activities that can be getting on with things in the same foreach iteration..? 👍
    – m1nkeh
    Nov 8 at 12:29












  • Thank you for your kind response.:) Regarding await inside foreach: (regular foreach, right?) No it will not. It will run them one by one. However that pattern is always useful, for IO bound tasks you should always use the Async method if you can. This way the CPU thread is freed while waiting for the network response. It can do any other task from the "global task list", it doesn't have to be related to your foreach. Most trivial example is in a GUI app it can update the GUI (so it doesn't freeze), in a web app it can process another request, and so on
    – Marcell Tóth
    Nov 8 at 22:27
















that is an absolutely superb explanation.. there were a couple of things in there that i had to stop, and think about... but it really is clear, and SO much better than attempting to piece together bits of info from a myriad of documents online.. re: my 'await inside foreach' example to confirm that won't do multiple AddAsync() jobs concurrently... it will still do them in sequence? And that pattern would only become helpful when you have other code/activities that can be getting on with things in the same foreach iteration..? 👍
– m1nkeh
Nov 8 at 12:29






that is an absolutely superb explanation.. there were a couple of things in there that i had to stop, and think about... but it really is clear, and SO much better than attempting to piece together bits of info from a myriad of documents online.. re: my 'await inside foreach' example to confirm that won't do multiple AddAsync() jobs concurrently... it will still do them in sequence? And that pattern would only become helpful when you have other code/activities that can be getting on with things in the same foreach iteration..? 👍
– m1nkeh
Nov 8 at 12:29














Thank you for your kind response.:) Regarding await inside foreach: (regular foreach, right?) No it will not. It will run them one by one. However that pattern is always useful, for IO bound tasks you should always use the Async method if you can. This way the CPU thread is freed while waiting for the network response. It can do any other task from the "global task list", it doesn't have to be related to your foreach. Most trivial example is in a GUI app it can update the GUI (so it doesn't freeze), in a web app it can process another request, and so on
– Marcell Tóth
Nov 8 at 22:27




Thank you for your kind response.:) Regarding await inside foreach: (regular foreach, right?) No it will not. It will run them one by one. However that pattern is always useful, for IO bound tasks you should always use the Async method if you can. This way the CPU thread is freed while waiting for the network response. It can do any other task from the "global task list", it doesn't have to be related to your foreach. Most trivial example is in a GUI app it can update the GUI (so it doesn't freeze), in a web app it can process another request, and so on
– Marcell Tóth
Nov 8 at 22:27


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195776%2feventhub-foreach-parallel-async%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini