OpenMP - store results in vector [duplicate]











up vote
0
down vote

favorite













This question already has an answer here:




  • OpenMP multiple threads update same array

    2 answers




I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.



for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}


This the does not work properly with the #pragma omp parallel for.



So what's the best practice to achieve that?

Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.



Something like that is not practical because it consumes to much space



int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}

// remove all unused slots in results array









share|improve this question















marked as duplicate by Zulan, Community Nov 7 at 14:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 1




    #pragma omp critical?
    – LogicStuff
    Nov 7 at 13:48










  • Yeah but this will slow down the whole process a lot, right?
    – flappix
    Nov 7 at 13:51










  • But it will also correct it a lot.
    – LogicStuff
    Nov 7 at 13:54















up vote
0
down vote

favorite













This question already has an answer here:




  • OpenMP multiple threads update same array

    2 answers




I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.



for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}


This the does not work properly with the #pragma omp parallel for.



So what's the best practice to achieve that?

Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.



Something like that is not practical because it consumes to much space



int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}

// remove all unused slots in results array









share|improve this question















marked as duplicate by Zulan, Community Nov 7 at 14:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 1




    #pragma omp critical?
    – LogicStuff
    Nov 7 at 13:48










  • Yeah but this will slow down the whole process a lot, right?
    – flappix
    Nov 7 at 13:51










  • But it will also correct it a lot.
    – LogicStuff
    Nov 7 at 13:54













up vote
0
down vote

favorite









up vote
0
down vote

favorite












This question already has an answer here:




  • OpenMP multiple threads update same array

    2 answers




I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.



for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}


This the does not work properly with the #pragma omp parallel for.



So what's the best practice to achieve that?

Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.



Something like that is not practical because it consumes to much space



int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}

// remove all unused slots in results array









share|improve this question
















This question already has an answer here:




  • OpenMP multiple threads update same array

    2 answers




I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.



for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results.push_back (result_value);
}
}


This the does not work properly with the #pragma omp parallel for.



So what's the best practice to achieve that?

Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.



Something like that is not practical because it consumes to much space



int *results = new int[n];
for (int i=0; i<n; i++)
{
// not every iteration produces a result
if (condition)
{
results[i] = result_value;
}
}

// remove all unused slots in results array




This question already has an answer here:




  • OpenMP multiple threads update same array

    2 answers








c++ openmp






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 13:49

























asked Nov 7 at 13:44









flappix

997




997




marked as duplicate by Zulan, Community Nov 7 at 14:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by Zulan, Community Nov 7 at 14:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 1




    #pragma omp critical?
    – LogicStuff
    Nov 7 at 13:48










  • Yeah but this will slow down the whole process a lot, right?
    – flappix
    Nov 7 at 13:51










  • But it will also correct it a lot.
    – LogicStuff
    Nov 7 at 13:54














  • 1




    #pragma omp critical?
    – LogicStuff
    Nov 7 at 13:48










  • Yeah but this will slow down the whole process a lot, right?
    – flappix
    Nov 7 at 13:51










  • But it will also correct it a lot.
    – LogicStuff
    Nov 7 at 13:54








1




1




#pragma omp critical?
– LogicStuff
Nov 7 at 13:48




#pragma omp critical?
– LogicStuff
Nov 7 at 13:48












Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51




Yeah but this will slow down the whole process a lot, right?
– flappix
Nov 7 at 13:51












But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54




But it will also correct it a lot.
– LogicStuff
Nov 7 at 13:54












2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










The "naive" way:
You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.



If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.



Finally, it's a quite known problem. You should read this for more detailed information:
C++ OpenMP Parallel For Loop - Alternatives to std::vector






share|improve this answer




























    up vote
    3
    down vote













    Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back in a critical region:



    for (int i=0; i<n; i++)
    {
    // not every iteration produces a result
    if (condition)
    {
    #pragma omp critical
    results.push_back (result_value);
    }
    }


    If threads are mostly busy with other things than the push_back, there will be little overhead from the critical region.



    Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:



    There is a good duplicate for this here and here.






    share|improve this answer






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote



      accepted










      The "naive" way:
      You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
      Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.



      If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.



      Finally, it's a quite known problem. You should read this for more detailed information:
      C++ OpenMP Parallel For Loop - Alternatives to std::vector






      share|improve this answer

























        up vote
        1
        down vote



        accepted










        The "naive" way:
        You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
        Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.



        If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.



        Finally, it's a quite known problem. You should read this for more detailed information:
        C++ OpenMP Parallel For Loop - Alternatives to std::vector






        share|improve this answer























          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          The "naive" way:
          You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
          Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.



          If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.



          Finally, it's a quite known problem. You should read this for more detailed information:
          C++ OpenMP Parallel For Loop - Alternatives to std::vector






          share|improve this answer












          The "naive" way:
          You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector.
          Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.



          If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.



          Finally, it's a quite known problem. You should read this for more detailed information:
          C++ OpenMP Parallel For Loop - Alternatives to std::vector







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 7 at 14:43









          L.C.

          1789




          1789
























              up vote
              3
              down vote













              Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back in a critical region:



              for (int i=0; i<n; i++)
              {
              // not every iteration produces a result
              if (condition)
              {
              #pragma omp critical
              results.push_back (result_value);
              }
              }


              If threads are mostly busy with other things than the push_back, there will be little overhead from the critical region.



              Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:



              There is a good duplicate for this here and here.






              share|improve this answer



























                up vote
                3
                down vote













                Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back in a critical region:



                for (int i=0; i<n; i++)
                {
                // not every iteration produces a result
                if (condition)
                {
                #pragma omp critical
                results.push_back (result_value);
                }
                }


                If threads are mostly busy with other things than the push_back, there will be little overhead from the critical region.



                Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:



                There is a good duplicate for this here and here.






                share|improve this answer

























                  up vote
                  3
                  down vote










                  up vote
                  3
                  down vote









                  Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back in a critical region:



                  for (int i=0; i<n; i++)
                  {
                  // not every iteration produces a result
                  if (condition)
                  {
                  #pragma omp critical
                  results.push_back (result_value);
                  }
                  }


                  If threads are mostly busy with other things than the push_back, there will be little overhead from the critical region.



                  Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:



                  There is a good duplicate for this here and here.






                  share|improve this answer














                  Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back in a critical region:



                  for (int i=0; i<n; i++)
                  {
                  // not every iteration produces a result
                  if (condition)
                  {
                  #pragma omp critical
                  results.push_back (result_value);
                  }
                  }


                  If threads are mostly busy with other things than the push_back, there will be little overhead from the critical region.



                  Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:



                  There is a good duplicate for this here and here.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 7 at 13:56

























                  answered Nov 7 at 13:51









                  Max Langhof

                  7,1521133




                  7,1521133















                      這個網誌中的熱門文章

                      Xamarin.form Move up view when keyboard appear

                      Post-Redirect-Get with Spring WebFlux and Thymeleaf

                      Anylogic : not able to use stopDelay()