OpenMP - store results in vector [duplicate]
up vote
0
down vote
favorite
This question already has an answer here:
OpenMP multiple threads update same array
2 answers
I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}
This the does not work properly with the #pragma omp parallel for
.
So what's the best practice to achieve that?
Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.
Something like that is not practical because it consumes to much space
int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}
// remove all unused slots in results array
c++ openmp
marked as duplicate by Zulan, Community♦ Nov 7 at 14:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
0
down vote
favorite
This question already has an answer here:
OpenMP multiple threads update same array
2 answers
I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}
This the does not work properly with the #pragma omp parallel for
.
So what's the best practice to achieve that?
Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.
Something like that is not practical because it consumes to much space
int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}
// remove all unused slots in results array
c++ openmp
marked as duplicate by Zulan, Community♦ Nov 7 at 14:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
#pragma omp critical
?
– LogicStuff
Nov 7 at 13:48
Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51
But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
This question already has an answer here:
OpenMP multiple threads update same array
2 answers
I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}
This the does not work properly with the #pragma omp parallel for
.
So what's the best practice to achieve that?
Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.
Something like that is not practical because it consumes to much space
int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}
// remove all unused slots in results array
c++ openmp
This question already has an answer here:
OpenMP multiple threads update same array
2 answers
I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}
This the does not work properly with the #pragma omp parallel for
.
So what's the best practice to achieve that?
Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.
Something like that is not practical because it consumes to much space
int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}
// remove all unused slots in results array
This question already has an answer here:
OpenMP multiple threads update same array
2 answers
c++ openmp
c++ openmp
edited Nov 7 at 13:49
asked Nov 7 at 13:44
flappix
997
997
marked as duplicate by Zulan, Community♦ Nov 7 at 14:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Zulan, Community♦ Nov 7 at 14:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
#pragma omp critical
?
– LogicStuff
Nov 7 at 13:48
Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51
But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54
add a comment |
1
#pragma omp critical
?
– LogicStuff
Nov 7 at 13:48
Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51
But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54
1
1
#pragma omp critical
?– LogicStuff
Nov 7 at 13:48
#pragma omp critical
?– LogicStuff
Nov 7 at 13:48
Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51
Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51
But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54
But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
The "naive" way:
You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.
If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.
Finally, it's a quite known problem. You should read this for more detailed information:
C++ OpenMP Parallel For Loop - Alternatives to std::vector
add a comment |
up vote
3
down vote
Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back
in a critical region:
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
#pragma omp critical
results.push_back (result_value);
}
}
If threads are mostly busy with other things than the push_back
, there will be little overhead from the critical region.
Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:
There is a good duplicate for this here and here.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
The "naive" way:
You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.
If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.
Finally, it's a quite known problem. You should read this for more detailed information:
C++ OpenMP Parallel For Loop - Alternatives to std::vector
add a comment |
up vote
1
down vote
accepted
The "naive" way:
You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.
If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.
Finally, it's a quite known problem. You should read this for more detailed information:
C++ OpenMP Parallel For Loop - Alternatives to std::vector
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
The "naive" way:
You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.
If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.
Finally, it's a quite known problem. You should read this for more detailed information:
C++ OpenMP Parallel For Loop - Alternatives to std::vector
The "naive" way:
You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.
If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.
Finally, it's a quite known problem. You should read this for more detailed information:
C++ OpenMP Parallel For Loop - Alternatives to std::vector
answered Nov 7 at 14:43
L.C.
1789
1789
add a comment |
add a comment |
up vote
3
down vote
Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back
in a critical region:
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
#pragma omp critical
results.push_back (result_value);
}
}
If threads are mostly busy with other things than the push_back
, there will be little overhead from the critical region.
Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:
There is a good duplicate for this here and here.
add a comment |
up vote
3
down vote
Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back
in a critical region:
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
#pragma omp critical
results.push_back (result_value);
}
}
If threads are mostly busy with other things than the push_back
, there will be little overhead from the critical region.
Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:
There is a good duplicate for this here and here.
add a comment |
up vote
3
down vote
up vote
3
down vote
Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back
in a critical region:
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
#pragma omp critical
results.push_back (result_value);
}
}
If threads are mostly busy with other things than the push_back
, there will be little overhead from the critical region.
Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:
There is a good duplicate for this here and here.
Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back
in a critical region:
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
#pragma omp critical
results.push_back (result_value);
}
}
If threads are mostly busy with other things than the push_back
, there will be little overhead from the critical region.
Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:
There is a good duplicate for this here and here.
edited Nov 7 at 13:56
answered Nov 7 at 13:51
Max Langhof
7,1521133
7,1521133
add a comment |
add a comment |
1
#pragma omp critical
?– LogicStuff
Nov 7 at 13:48
Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51
But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54