How could I paralelize this function with OpenMP without race conditions nor false sharing?

I need to paralelize one funcion without race conditions nor false sharing. I have tried many ways but I could not achieve that yet. The function is:

__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));



    // sum all points

    // for every point

    for (int i = 0; i < numObjs; ++i) {

        // which cluster is it in?

        int activeCluster = clusterAssignmentCurrent[i];



        // update count of members in that cluster

        ++clusterMemberCount[activeCluster];



        // sum point coordinates for finding centroid

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

    }





    // now divide each coordinate sum by number of members to find mean/centroid

    // for each cluster

    for (int i = 0; i < numClusters; ++i) {

        if (clusterMemberCount[i] != 0)

            // for each coordinate

            for (int j = 0; j < numCoords; ++j)

                clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

    }

Any idea how could I achieve that?

Thank you.

edited Nov 15 '18 at 15:31

asked Nov 15 '18 at 14:10

JuMoGar

8061217

What have you tried and how did it no work? What is a "career condition"?

– Zulan
Nov 15 '18 at 14:56

That means: Some threads enter to a position for writtig a data It can be solved with atomic or critical or changing secuential code for avoid it

– JuMoGar
Nov 15 '18 at 15:01

And I have tried many ways. I can say all of them. Basically I have tried change sequential code adding +1 dimension of each variable and then reduce it. I mean: (seq:)int *myVar; (omp:) int **myVar --> (sec:) myVar[i]; (omp:) myVar[omp_get_thread_num()][i]

– JuMoGar
Nov 15 '18 at 15:03

So you mean race condition, or data race sometimes also called hazard. But I have never had it referred to has career condition. Have you tried the easiest: adding omp parallel for to the first inner, and the second outer loop?

– Zulan
Nov 15 '18 at 15:19

Yes, also have tried it. But I did not get expected result (it should be 300, but I get other nums). Oh, ok, I will update question with 'race conditions', thank you.

– JuMoGar
Nov 15 '18 at 15:31

|
show 2 more comments

I need to paralelize one funcion without race conditions nor false sharing. I have tried many ways but I could not achieve that yet. The function is:

__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));



    // sum all points

    // for every point

    for (int i = 0; i < numObjs; ++i) {

        // which cluster is it in?

        int activeCluster = clusterAssignmentCurrent[i];



        // update count of members in that cluster

        ++clusterMemberCount[activeCluster];



        // sum point coordinates for finding centroid

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

    }





    // now divide each coordinate sum by number of members to find mean/centroid

    // for each cluster

    for (int i = 0; i < numClusters; ++i) {

        if (clusterMemberCount[i] != 0)

            // for each coordinate

            for (int j = 0; j < numCoords; ++j)

                clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

    }

Any idea how could I achieve that?

Thank you.

edited Nov 15 '18 at 15:31

asked Nov 15 '18 at 14:10

JuMoGar

8061217

What have you tried and how did it no work? What is a "career condition"?

– Zulan
Nov 15 '18 at 14:56

That means: Some threads enter to a position for writtig a data It can be solved with atomic or critical or changing secuential code for avoid it

– JuMoGar
Nov 15 '18 at 15:01

And I have tried many ways. I can say all of them. Basically I have tried change sequential code adding +1 dimension of each variable and then reduce it. I mean: (seq:)int *myVar; (omp:) int **myVar --> (sec:) myVar[i]; (omp:) myVar[omp_get_thread_num()][i]

– JuMoGar
Nov 15 '18 at 15:03

So you mean race condition, or data race sometimes also called hazard. But I have never had it referred to has career condition. Have you tried the easiest: adding omp parallel for to the first inner, and the second outer loop?

– Zulan
Nov 15 '18 at 15:19

Yes, also have tried it. But I did not get expected result (it should be 300, but I get other nums). Oh, ok, I will update question with 'race conditions', thank you.

– JuMoGar
Nov 15 '18 at 15:31

|
show 2 more comments

I need to paralelize one funcion without race conditions nor false sharing. I have tried many ways but I could not achieve that yet. The function is:

__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));



    // sum all points

    // for every point

    for (int i = 0; i < numObjs; ++i) {

        // which cluster is it in?

        int activeCluster = clusterAssignmentCurrent[i];



        // update count of members in that cluster

        ++clusterMemberCount[activeCluster];



        // sum point coordinates for finding centroid

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

    }





    // now divide each coordinate sum by number of members to find mean/centroid

    // for each cluster

    for (int i = 0; i < numClusters; ++i) {

        if (clusterMemberCount[i] != 0)

            // for each coordinate

            for (int j = 0; j < numCoords; ++j)

                clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

    }

Any idea how could I achieve that?

Thank you.

edited Nov 15 '18 at 15:31

asked Nov 15 '18 at 14:10

JuMoGar

8061217

I need to paralelize one funcion without race conditions nor false sharing. I have tried many ways but I could not achieve that yet. The function is:

__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));



    // sum all points

    // for every point

    for (int i = 0; i < numObjs; ++i) {

        // which cluster is it in?

        int activeCluster = clusterAssignmentCurrent[i];



        // update count of members in that cluster

        ++clusterMemberCount[activeCluster];



        // sum point coordinates for finding centroid

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

    }





    // now divide each coordinate sum by number of members to find mean/centroid

    // for each cluster

    for (int i = 0; i < numClusters; ++i) {

        if (clusterMemberCount[i] != 0)

            // for each coordinate

            for (int j = 0; j < numCoords; ++j)

                clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

    }

Any idea how could I achieve that?

Thank you.

c performance parallel-processing openmp

edited Nov 15 '18 at 15:31

asked Nov 15 '18 at 14:10

JuMoGar

8061217

edited Nov 15 '18 at 15:31

asked Nov 15 '18 at 14:10

JuMoGar

8061217

edited Nov 15 '18 at 15:31

asked Nov 15 '18 at 14:10

JuMoGar

8061217

asked Nov 15 '18 at 14:10

JuMoGar

8061217

asked Nov 15 '18 at 14:10

JuMoGar

8061217

What have you tried and how did it no work? What is a "career condition"?

– Zulan
Nov 15 '18 at 14:56

That means: Some threads enter to a position for writtig a data It can be solved with atomic or critical or changing secuential code for avoid it

– JuMoGar
Nov 15 '18 at 15:01

And I have tried many ways. I can say all of them. Basically I have tried change sequential code adding +1 dimension of each variable and then reduce it. I mean: (seq:)int *myVar; (omp:) int **myVar --> (sec:) myVar[i]; (omp:) myVar[omp_get_thread_num()][i]

– JuMoGar
Nov 15 '18 at 15:03

So you mean race condition, or data race sometimes also called hazard. But I have never had it referred to has career condition. Have you tried the easiest: adding omp parallel for to the first inner, and the second outer loop?

– Zulan
Nov 15 '18 at 15:19

Yes, also have tried it. But I did not get expected result (it should be 300, but I get other nums). Oh, ok, I will update question with 'race conditions', thank you.

– JuMoGar
Nov 15 '18 at 15:31

|
show 2 more comments

What have you tried and how did it no work? What is a "career condition"?

– Zulan
Nov 15 '18 at 14:56

That means: Some threads enter to a position for writtig a data It can be solved with atomic or critical or changing secuential code for avoid it

– JuMoGar
Nov 15 '18 at 15:01

And I have tried many ways. I can say all of them. Basically I have tried change sequential code adding +1 dimension of each variable and then reduce it. I mean: (seq:)int *myVar; (omp:) int **myVar --> (sec:) myVar[i]; (omp:) myVar[omp_get_thread_num()][i]

– JuMoGar
Nov 15 '18 at 15:03

So you mean race condition, or data race sometimes also called hazard. But I have never had it referred to has career condition. Have you tried the easiest: adding omp parallel for to the first inner, and the second outer loop?

– Zulan
Nov 15 '18 at 15:19

Yes, also have tried it. But I did not get expected result (it should be 300, but I get other nums). Oh, ok, I will update question with 'race conditions', thank you.

– JuMoGar
Nov 15 '18 at 15:31

What have you tried and how did it no work? What is a "career condition"?

– Zulan
Nov 15 '18 at 14:56

That means: Some threads enter to a position for writtig a data It can be solved with atomic or critical or changing secuential code for avoid it

– JuMoGar
Nov 15 '18 at 15:01

And I have tried many ways. I can say all of them. Basically I have tried change sequential code adding +1 dimension of each variable and then reduce it. I mean: (seq:)int *myVar; (omp:) int **myVar --> (sec:) myVar[i]; (omp:) myVar[omp_get_thread_num()][i]

– JuMoGar
Nov 15 '18 at 15:03

So you mean race condition, or data race sometimes also called hazard. But I have never had it referred to has career condition. Have you tried the easiest: adding omp parallel for to the first inner, and the second outer loop?

– Zulan
Nov 15 '18 at 15:19

Yes, also have tried it. But I did not get expected result (it should be 300, but I get other nums). Oh, ok, I will update question with 'race conditions', thank you.

– JuMoGar
Nov 15 '18 at 15:31

|
show 2 more comments

3 Answers
3

active

oldest

votes

This is pretty straight forward

// sum all points

// for every point

for (int i = 0; i < numObjs; ++i) {

    // which cluster is it in?

    int activeCluster = clusterAssignmentCurrent[i];



    // update count of members in that cluster

    ++clusterMemberCount[activeCluster];



    // sum point coordinates for finding centroid

    #pragma omp parallel for

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

The inner loop is perfectly fine to parallelize as all writes happen to different elements of clustersCentroID. You can safely assume that the default schedule will not exhibit significant false sharing, it typically has large-enough chunks. Just don't try something like schedule(static,1).

The outer loop is not as easy to parallelize. You could either use a reduction on clusterMemberCount and clusterMemberCount, or do something like:

#pragma omp parallel // note NO for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    // ensure that exactly one thread works on each cluster

    if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue;

Only do this if the simple solution does not yield sufficient performance.

The other loop is simple as well

#pragma omp parallel for

for (int i = 0; i < numClusters; ++i) {

    if (clusterMemberCount[i] != 0)

        // for each coordinate

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];

}

Again, data access is perfectly isolated both in terms of correctness and, except for edge cases, false sharing.

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

parallelizing the first outer loop with if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue; will distribute every other value of activeCluster to a different thread. This will result in some false sharing when writing on clusterAssignmentCurrent, and potentially also when writing on clustersCentroID if there are not that many dimensions (the edge effect will become significant if numCoords is small, i.e. it the exact case when parallelizing the inner loop is not efficient).

– Brice
Nov 19 '18 at 9:51

You are right, but I did not want to make assumptions about numCoords. The simple approach is fine until you know more about the values and/or measurements show that it is too slow. I only try to sketch the direction for parallelizing the outer loop - reduction is usually advisable. You can use any sophisticated distribution of clusters to threads.

– Zulan
Nov 19 '18 at 12:31

add a comment |

You should give an order of magnitude for the expected values of numCoords, numObjs and numClusters as the optimal way to parallelize depends on that. Especially, numCoords is important to see if parallelizing/vectorizing the inner loop over coordinates makes sense; like, are you taking of 3D coordinates or 1000 dimensions?

Another attempt with the drawback of an if statement in the first loop (bad for performance), static schedule (possible load unbalance) but each thread incrementing contiguous parts of clusterMemberCount and clustersCentroID which limits the risk of false sharing.

#ifdef _OPENMP

   #include <omp.h>

#else

   #define omp_get_num_threads() 1

   #define omp_get_thread_num() 0

#endif





__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));

    // sum all points

    // for every point



    #pragma omp parallel

    {

        int nbOfThreads = omp_get_num_threads();

        int thisThread = omp_get_thread_num();

        // Schedule for the first step : process only cluster with ID in the [from , to[ range

        int clustFrom = (thisThread*numClusters)/nbOfThreads;

        int clustTo   = (thisThread+1 == nbOfThreads) ? numClusters : ((thisThread+1)*numClusters)/nbOfThreads;



        // Each thread will loop through all values of numObjs but only process them depending on activeCluster

        // The loop is skipped only if the thread was assigned no cluster

        if (clustTo>clustFrom){

            for (int i = 0; i < numObjs; ++i) {

                // which cluster is it in?

                int activeCluster = clusterAssignmentCurrent[i];



                if (activeCluster>=clustFrom && activeCluster<clustTo){

                    // update count of members in that cluster

                    ++clusterMemberCount[activeCluster];



                    // sum point coordinates for finding centroid

                    for (int j = 0; j < numCoords; ++j)

                        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

                }

            }

        }



        #pragma omp barrier



        // now divide each coordinate sum by number of members to find mean/centroid

        // for each cluster

        #pragma omp for // straightforward

        for (int i = 0; i < numClusters; ++i) {

            if (clusterMemberCount[i] != 0)

                // for each coordinate

                for (int j = 0; j < numCoords; ++j)

                    clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

        }

    }

    free(clusterMemberCount);

}

edited Nov 19 '18 at 14:21

answered Nov 16 '18 at 9:54

Brice

1,400110

This is a good approach. The code snippet is mixing clustFrom/To and clusterLimits. You might want to consider numClusters < nbOfThreads.

– Zulan
Nov 19 '18 at 12:35

You are right about the cluster limits & from/to , I edited the answer. The code should run fine even if numClusters < nbOfThreads. Of course, in that case, all threads but the last one will have clustFrom==0 and clustTo==0 and would loop for no reason...

– Brice
Nov 19 '18 at 13:59

I moved a bit the parentheses in clustFrom & clustTo definition for better load balancing, and added an if to skip the first loop if there is nothing to process, rather then loop and hit an if(condition_always_false)

– Brice
Nov 19 '18 at 14:25

add a comment |

-1

Adding to my comment: ++clusterMemberCount[activeCluster] forms a histogram and is problematic when two threads try to update same item (or adjacent item sharing a cache line). This needs to be taken out of the loop either as a sequential part, or must be parallelized by having a separate copy of the histogram for each thread, then combined.

You can easily separate this part from the first parallel loop.

// Make the histogram

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    ++clusterMemberCount[activeCluster];

}

Then process everything exploiting parallelism

// parallel processing

#pragma omp parallel for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

The second time there's a potential false sharing is when numCoords * sizeof(clustersCentroID[0]) % 64 != 0 assuming 64-byte cache line. This can be mitigated by overallocating clustersCentroID to full multiples of 64 bytes.

// Loop for numCoords, but index by numCoordsX

for (int j = 0; j < numCoords; ++j)

    clustersCentroID[activeCluster*numCoordsX + j] += dataSetMatrix[i*numCoords + j];

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

Then, assuming 64-byte cache line, numCoordsX = numCoods*64 ?

– JuMoGar
Nov 18 '18 at 17:27

float *clusterCentroID means, that each element takes 4 bytes, thus numCoordsX must be a multiple of 16. Which doesn't mean it has to be any multiple of numCoords. But again, just as commented earlier by someone else, you should tell us the expected magnitude of numCoords and all the relevant variables. Eg. what is the maximum clusterMemberCount, what is the maximum activeCluster and so on.

– Aki Suihkonen
Nov 18 '18 at 18:17

There is a race condition in this code when writing at clustersCentroID. It should be somehow protected, the same way the increment of clusterMemberCounthas been protected against race conditions. As long as two threads can be processing points belonging to the same activeCluster, writing to an array indexed by activeCluster is not safe.

– Brice
Nov 19 '18 at 9:57

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53321311%2fhow-could-i-paralelize-this-function-with-openmp-without-race-conditions-nor-fal%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

This is pretty straight forward

// sum all points

// for every point

for (int i = 0; i < numObjs; ++i) {

    // which cluster is it in?

    int activeCluster = clusterAssignmentCurrent[i];



    // update count of members in that cluster

    ++clusterMemberCount[activeCluster];



    // sum point coordinates for finding centroid

    #pragma omp parallel for

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

The outer loop is not as easy to parallelize. You could either use a reduction on clusterMemberCount and clusterMemberCount, or do something like:

#pragma omp parallel // note NO for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    // ensure that exactly one thread works on each cluster

    if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue;

Only do this if the simple solution does not yield sufficient performance.

The other loop is simple as well

#pragma omp parallel for

for (int i = 0; i < numClusters; ++i) {

    if (clusterMemberCount[i] != 0)

        // for each coordinate

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];

}

Again, data access is perfectly isolated both in terms of correctness and, except for edge cases, false sharing.

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

parallelizing the first outer loop with if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue; will distribute every other value of activeCluster to a different thread. This will result in some false sharing when writing on clusterAssignmentCurrent, and potentially also when writing on clustersCentroID if there are not that many dimensions (the edge effect will become significant if numCoords is small, i.e. it the exact case when parallelizing the inner loop is not efficient).

– Brice
Nov 19 '18 at 9:51

You are right, but I did not want to make assumptions about numCoords. The simple approach is fine until you know more about the values and/or measurements show that it is too slow. I only try to sketch the direction for parallelizing the outer loop - reduction is usually advisable. You can use any sophisticated distribution of clusters to threads.

– Zulan
Nov 19 '18 at 12:31

add a comment |

This is pretty straight forward

// sum all points

// for every point

for (int i = 0; i < numObjs; ++i) {

    // which cluster is it in?

    int activeCluster = clusterAssignmentCurrent[i];



    // update count of members in that cluster

    ++clusterMemberCount[activeCluster];



    // sum point coordinates for finding centroid

    #pragma omp parallel for

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

The outer loop is not as easy to parallelize. You could either use a reduction on clusterMemberCount and clusterMemberCount, or do something like:

#pragma omp parallel // note NO for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    // ensure that exactly one thread works on each cluster

    if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue;

Only do this if the simple solution does not yield sufficient performance.

The other loop is simple as well

#pragma omp parallel for

for (int i = 0; i < numClusters; ++i) {

    if (clusterMemberCount[i] != 0)

        // for each coordinate

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];

}

Again, data access is perfectly isolated both in terms of correctness and, except for edge cases, false sharing.

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

parallelizing the first outer loop with if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue; will distribute every other value of activeCluster to a different thread. This will result in some false sharing when writing on clusterAssignmentCurrent, and potentially also when writing on clustersCentroID if there are not that many dimensions (the edge effect will become significant if numCoords is small, i.e. it the exact case when parallelizing the inner loop is not efficient).

– Brice
Nov 19 '18 at 9:51

You are right, but I did not want to make assumptions about numCoords. The simple approach is fine until you know more about the values and/or measurements show that it is too slow. I only try to sketch the direction for parallelizing the outer loop - reduction is usually advisable. You can use any sophisticated distribution of clusters to threads.

– Zulan
Nov 19 '18 at 12:31

add a comment |

This is pretty straight forward

// sum all points

// for every point

for (int i = 0; i < numObjs; ++i) {

    // which cluster is it in?

    int activeCluster = clusterAssignmentCurrent[i];



    // update count of members in that cluster

    ++clusterMemberCount[activeCluster];



    // sum point coordinates for finding centroid

    #pragma omp parallel for

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

The outer loop is not as easy to parallelize. You could either use a reduction on clusterMemberCount and clusterMemberCount, or do something like:

#pragma omp parallel // note NO for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    // ensure that exactly one thread works on each cluster

    if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue;

Only do this if the simple solution does not yield sufficient performance.

The other loop is simple as well

#pragma omp parallel for

for (int i = 0; i < numClusters; ++i) {

    if (clusterMemberCount[i] != 0)

        // for each coordinate

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];

}

Again, data access is perfectly isolated both in terms of correctness and, except for edge cases, false sharing.

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

This is pretty straight forward

// sum all points

// for every point

for (int i = 0; i < numObjs; ++i) {

    // which cluster is it in?

    int activeCluster = clusterAssignmentCurrent[i];



    // update count of members in that cluster

    ++clusterMemberCount[activeCluster];



    // sum point coordinates for finding centroid

    #pragma omp parallel for

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

The outer loop is not as easy to parallelize. You could either use a reduction on clusterMemberCount and clusterMemberCount, or do something like:

#pragma omp parallel // note NO for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    // ensure that exactly one thread works on each cluster

    if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue;

Only do this if the simple solution does not yield sufficient performance.

The other loop is simple as well

#pragma omp parallel for

for (int i = 0; i < numClusters; ++i) {

    if (clusterMemberCount[i] != 0)

        // for each coordinate

        for (int j = 0; j < numCoords; ++j)

            clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];

}

Again, data access is perfectly isolated both in terms of correctness and, except for edge cases, false sharing.

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

answered Nov 16 '18 at 9:03

Zulan

15.4k63070

parallelizing the first outer loop with if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue; will distribute every other value of activeCluster to a different thread. This will result in some false sharing when writing on clusterAssignmentCurrent, and potentially also when writing on clustersCentroID if there are not that many dimensions (the edge effect will become significant if numCoords is small, i.e. it the exact case when parallelizing the inner loop is not efficient).

– Brice
Nov 19 '18 at 9:51

You are right, but I did not want to make assumptions about numCoords. The simple approach is fine until you know more about the values and/or measurements show that it is too slow. I only try to sketch the direction for parallelizing the outer loop - reduction is usually advisable. You can use any sophisticated distribution of clusters to threads.

– Zulan
Nov 19 '18 at 12:31

add a comment |

parallelizing the first outer loop with if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue; will distribute every other value of activeCluster to a different thread. This will result in some false sharing when writing on clusterAssignmentCurrent, and potentially also when writing on clustersCentroID if there are not that many dimensions (the edge effect will become significant if numCoords is small, i.e. it the exact case when parallelizing the inner loop is not efficient).

– Brice
Nov 19 '18 at 9:51

You are right, but I did not want to make assumptions about numCoords. The simple approach is fine until you know more about the values and/or measurements show that it is too slow. I only try to sketch the direction for parallelizing the outer loop - reduction is usually advisable. You can use any sophisticated distribution of clusters to threads.

– Zulan
Nov 19 '18 at 12:31

parallelizing the first outer loop with if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue; will distribute every other value of activeCluster to a different thread. This will result in some false sharing when writing on clusterAssignmentCurrent, and potentially also when writing on clustersCentroID if there are not that many dimensions (the edge effect will become significant if numCoords is small, i.e. it the exact case when parallelizing the inner loop is not efficient).

– Brice
Nov 19 '18 at 9:51

You are right, but I did not want to make assumptions about numCoords. The simple approach is fine until you know more about the values and/or measurements show that it is too slow. I only try to sketch the direction for parallelizing the outer loop - reduction is usually advisable. You can use any sophisticated distribution of clusters to threads.

– Zulan
Nov 19 '18 at 12:31

add a comment |

#ifdef _OPENMP

   #include <omp.h>

#else

   #define omp_get_num_threads() 1

   #define omp_get_thread_num() 0

#endif





__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));

    // sum all points

    // for every point



    #pragma omp parallel

    {

        int nbOfThreads = omp_get_num_threads();

        int thisThread = omp_get_thread_num();

        // Schedule for the first step : process only cluster with ID in the [from , to[ range

        int clustFrom = (thisThread*numClusters)/nbOfThreads;

        int clustTo   = (thisThread+1 == nbOfThreads) ? numClusters : ((thisThread+1)*numClusters)/nbOfThreads;



        // Each thread will loop through all values of numObjs but only process them depending on activeCluster

        // The loop is skipped only if the thread was assigned no cluster

        if (clustTo>clustFrom){

            for (int i = 0; i < numObjs; ++i) {

                // which cluster is it in?

                int activeCluster = clusterAssignmentCurrent[i];



                if (activeCluster>=clustFrom && activeCluster<clustTo){

                    // update count of members in that cluster

                    ++clusterMemberCount[activeCluster];



                    // sum point coordinates for finding centroid

                    for (int j = 0; j < numCoords; ++j)

                        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

                }

            }

        }



        #pragma omp barrier



        // now divide each coordinate sum by number of members to find mean/centroid

        // for each cluster

        #pragma omp for // straightforward

        for (int i = 0; i < numClusters; ++i) {

            if (clusterMemberCount[i] != 0)

                // for each coordinate

                for (int j = 0; j < numCoords; ++j)

                    clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

        }

    }

    free(clusterMemberCount);

}

edited Nov 19 '18 at 14:21

answered Nov 16 '18 at 9:54

Brice

1,400110

This is a good approach. The code snippet is mixing clustFrom/To and clusterLimits. You might want to consider numClusters < nbOfThreads.

– Zulan
Nov 19 '18 at 12:35

You are right about the cluster limits & from/to , I edited the answer. The code should run fine even if numClusters < nbOfThreads. Of course, in that case, all threads but the last one will have clustFrom==0 and clustTo==0 and would loop for no reason...

– Brice
Nov 19 '18 at 13:59

I moved a bit the parentheses in clustFrom & clustTo definition for better load balancing, and added an if to skip the first loop if there is nothing to process, rather then loop and hit an if(condition_always_false)

– Brice
Nov 19 '18 at 14:25

add a comment |

#ifdef _OPENMP

   #include <omp.h>

#else

   #define omp_get_num_threads() 1

   #define omp_get_thread_num() 0

#endif





__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));

    // sum all points

    // for every point



    #pragma omp parallel

    {

        int nbOfThreads = omp_get_num_threads();

        int thisThread = omp_get_thread_num();

        // Schedule for the first step : process only cluster with ID in the [from , to[ range

        int clustFrom = (thisThread*numClusters)/nbOfThreads;

        int clustTo   = (thisThread+1 == nbOfThreads) ? numClusters : ((thisThread+1)*numClusters)/nbOfThreads;



        // Each thread will loop through all values of numObjs but only process them depending on activeCluster

        // The loop is skipped only if the thread was assigned no cluster

        if (clustTo>clustFrom){

            for (int i = 0; i < numObjs; ++i) {

                // which cluster is it in?

                int activeCluster = clusterAssignmentCurrent[i];



                if (activeCluster>=clustFrom && activeCluster<clustTo){

                    // update count of members in that cluster

                    ++clusterMemberCount[activeCluster];



                    // sum point coordinates for finding centroid

                    for (int j = 0; j < numCoords; ++j)

                        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

                }

            }

        }



        #pragma omp barrier



        // now divide each coordinate sum by number of members to find mean/centroid

        // for each cluster

        #pragma omp for // straightforward

        for (int i = 0; i < numClusters; ++i) {

            if (clusterMemberCount[i] != 0)

                // for each coordinate

                for (int j = 0; j < numCoords; ++j)

                    clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

        }

    }

    free(clusterMemberCount);

}

edited Nov 19 '18 at 14:21

answered Nov 16 '18 at 9:54

Brice

1,400110

This is a good approach. The code snippet is mixing clustFrom/To and clusterLimits. You might want to consider numClusters < nbOfThreads.

– Zulan
Nov 19 '18 at 12:35

You are right about the cluster limits & from/to , I edited the answer. The code should run fine even if numClusters < nbOfThreads. Of course, in that case, all threads but the last one will have clustFrom==0 and clustTo==0 and would loop for no reason...

– Brice
Nov 19 '18 at 13:59

I moved a bit the parentheses in clustFrom & clustTo definition for better load balancing, and added an if to skip the first loop if there is nothing to process, rather then loop and hit an if(condition_always_false)

– Brice
Nov 19 '18 at 14:25

add a comment |

#ifdef _OPENMP

   #include <omp.h>

#else

   #define omp_get_num_threads() 1

   #define omp_get_thread_num() 0

#endif





__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));

    // sum all points

    // for every point



    #pragma omp parallel

    {

        int nbOfThreads = omp_get_num_threads();

        int thisThread = omp_get_thread_num();

        // Schedule for the first step : process only cluster with ID in the [from , to[ range

        int clustFrom = (thisThread*numClusters)/nbOfThreads;

        int clustTo   = (thisThread+1 == nbOfThreads) ? numClusters : ((thisThread+1)*numClusters)/nbOfThreads;



        // Each thread will loop through all values of numObjs but only process them depending on activeCluster

        // The loop is skipped only if the thread was assigned no cluster

        if (clustTo>clustFrom){

            for (int i = 0; i < numObjs; ++i) {

                // which cluster is it in?

                int activeCluster = clusterAssignmentCurrent[i];



                if (activeCluster>=clustFrom && activeCluster<clustTo){

                    // update count of members in that cluster

                    ++clusterMemberCount[activeCluster];



                    // sum point coordinates for finding centroid

                    for (int j = 0; j < numCoords; ++j)

                        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

                }

            }

        }



        #pragma omp barrier



        // now divide each coordinate sum by number of members to find mean/centroid

        // for each cluster

        #pragma omp for // straightforward

        for (int i = 0; i < numClusters; ++i) {

            if (clusterMemberCount[i] != 0)

                // for each coordinate

                for (int j = 0; j < numCoords; ++j)

                    clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

        }

    }

    free(clusterMemberCount);

}

edited Nov 19 '18 at 14:21

answered Nov 16 '18 at 9:54

Brice

1,400110

#ifdef _OPENMP

   #include <omp.h>

#else

   #define omp_get_num_threads() 1

   #define omp_get_thread_num() 0

#endif





__inline static

void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {

    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));

    // sum all points

    // for every point



    #pragma omp parallel

    {

        int nbOfThreads = omp_get_num_threads();

        int thisThread = omp_get_thread_num();

        // Schedule for the first step : process only cluster with ID in the [from , to[ range

        int clustFrom = (thisThread*numClusters)/nbOfThreads;

        int clustTo   = (thisThread+1 == nbOfThreads) ? numClusters : ((thisThread+1)*numClusters)/nbOfThreads;



        // Each thread will loop through all values of numObjs but only process them depending on activeCluster

        // The loop is skipped only if the thread was assigned no cluster

        if (clustTo>clustFrom){

            for (int i = 0; i < numObjs; ++i) {

                // which cluster is it in?

                int activeCluster = clusterAssignmentCurrent[i];



                if (activeCluster>=clustFrom && activeCluster<clustTo){

                    // update count of members in that cluster

                    ++clusterMemberCount[activeCluster];



                    // sum point coordinates for finding centroid

                    for (int j = 0; j < numCoords; ++j)

                        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

                }

            }

        }



        #pragma omp barrier



        // now divide each coordinate sum by number of members to find mean/centroid

        // for each cluster

        #pragma omp for // straightforward

        for (int i = 0; i < numClusters; ++i) {

            if (clusterMemberCount[i] != 0)

                // for each coordinate

                for (int j = 0; j < numCoords; ++j)

                    clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!

        }

    }

    free(clusterMemberCount);

}

edited Nov 19 '18 at 14:21

answered Nov 16 '18 at 9:54

Brice

1,400110

edited Nov 19 '18 at 14:21

answered Nov 16 '18 at 9:54

Brice

1,400110

answered Nov 16 '18 at 9:54

Brice

1,400110

answered Nov 16 '18 at 9:54

Brice

1,400110

This is a good approach. The code snippet is mixing clustFrom/To and clusterLimits. You might want to consider numClusters < nbOfThreads.

– Zulan
Nov 19 '18 at 12:35

You are right about the cluster limits & from/to , I edited the answer. The code should run fine even if numClusters < nbOfThreads. Of course, in that case, all threads but the last one will have clustFrom==0 and clustTo==0 and would loop for no reason...

– Brice
Nov 19 '18 at 13:59

I moved a bit the parentheses in clustFrom & clustTo definition for better load balancing, and added an if to skip the first loop if there is nothing to process, rather then loop and hit an if(condition_always_false)

– Brice
Nov 19 '18 at 14:25

add a comment |

This is a good approach. The code snippet is mixing clustFrom/To and clusterLimits. You might want to consider numClusters < nbOfThreads.

– Zulan
Nov 19 '18 at 12:35

You are right about the cluster limits & from/to , I edited the answer. The code should run fine even if numClusters < nbOfThreads. Of course, in that case, all threads but the last one will have clustFrom==0 and clustTo==0 and would loop for no reason...

– Brice
Nov 19 '18 at 13:59

I moved a bit the parentheses in clustFrom & clustTo definition for better load balancing, and added an if to skip the first loop if there is nothing to process, rather then loop and hit an if(condition_always_false)

– Brice
Nov 19 '18 at 14:25

This is a good approach. The code snippet is mixing clustFrom/To and clusterLimits. You might want to consider numClusters < nbOfThreads.

– Zulan
Nov 19 '18 at 12:35

You are right about the cluster limits & from/to , I edited the answer. The code should run fine even if numClusters < nbOfThreads. Of course, in that case, all threads but the last one will have clustFrom==0 and clustTo==0 and would loop for no reason...

– Brice
Nov 19 '18 at 13:59

I moved a bit the parentheses in clustFrom & clustTo definition for better load balancing, and added an if to skip the first loop if there is nothing to process, rather then loop and hit an if(condition_always_false)

– Brice
Nov 19 '18 at 14:25

add a comment |

-1

You can easily separate this part from the first parallel loop.

// Make the histogram

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    ++clusterMemberCount[activeCluster];

}

Then process everything exploiting parallelism

// parallel processing

#pragma omp parallel for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

// Loop for numCoords, but index by numCoordsX

for (int j = 0; j < numCoords; ++j)

    clustersCentroID[activeCluster*numCoordsX + j] += dataSetMatrix[i*numCoords + j];

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

Then, assuming 64-byte cache line, numCoordsX = numCoods*64 ?

– JuMoGar
Nov 18 '18 at 17:27

float *clusterCentroID means, that each element takes 4 bytes, thus numCoordsX must be a multiple of 16. Which doesn't mean it has to be any multiple of numCoords. But again, just as commented earlier by someone else, you should tell us the expected magnitude of numCoords and all the relevant variables. Eg. what is the maximum clusterMemberCount, what is the maximum activeCluster and so on.

– Aki Suihkonen
Nov 18 '18 at 18:17

There is a race condition in this code when writing at clustersCentroID. It should be somehow protected, the same way the increment of clusterMemberCounthas been protected against race conditions. As long as two threads can be processing points belonging to the same activeCluster, writing to an array indexed by activeCluster is not safe.

– Brice
Nov 19 '18 at 9:57

add a comment |

-1

You can easily separate this part from the first parallel loop.

// Make the histogram

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    ++clusterMemberCount[activeCluster];

}

Then process everything exploiting parallelism

// parallel processing

#pragma omp parallel for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

// Loop for numCoords, but index by numCoordsX

for (int j = 0; j < numCoords; ++j)

    clustersCentroID[activeCluster*numCoordsX + j] += dataSetMatrix[i*numCoords + j];

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

Then, assuming 64-byte cache line, numCoordsX = numCoods*64 ?

– JuMoGar
Nov 18 '18 at 17:27

float *clusterCentroID means, that each element takes 4 bytes, thus numCoordsX must be a multiple of 16. Which doesn't mean it has to be any multiple of numCoords. But again, just as commented earlier by someone else, you should tell us the expected magnitude of numCoords and all the relevant variables. Eg. what is the maximum clusterMemberCount, what is the maximum activeCluster and so on.

– Aki Suihkonen
Nov 18 '18 at 18:17

There is a race condition in this code when writing at clustersCentroID. It should be somehow protected, the same way the increment of clusterMemberCounthas been protected against race conditions. As long as two threads can be processing points belonging to the same activeCluster, writing to an array indexed by activeCluster is not safe.

– Brice
Nov 19 '18 at 9:57

add a comment |

-1

You can easily separate this part from the first parallel loop.

// Make the histogram

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    ++clusterMemberCount[activeCluster];

}

Then process everything exploiting parallelism

// parallel processing

#pragma omp parallel for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

// Loop for numCoords, but index by numCoordsX

for (int j = 0; j < numCoords; ++j)

    clustersCentroID[activeCluster*numCoordsX + j] += dataSetMatrix[i*numCoords + j];

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

You can easily separate this part from the first parallel loop.

// Make the histogram

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    ++clusterMemberCount[activeCluster];

}

Then process everything exploiting parallelism

// parallel processing

#pragma omp parallel for

for (int i = 0; i < numObjs; ++i) {

    int activeCluster = clusterAssignmentCurrent[i];

    for (int j = 0; j < numCoords; ++j)

        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];

}

// Loop for numCoords, but index by numCoordsX

for (int j = 0; j < numCoords; ++j)

    clustersCentroID[activeCluster*numCoordsX + j] += dataSetMatrix[i*numCoords + j];

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

answered Nov 17 '18 at 10:17

Aki Suihkonen

14.3k12245

Then, assuming 64-byte cache line, numCoordsX = numCoods*64 ?

– JuMoGar
Nov 18 '18 at 17:27

float *clusterCentroID means, that each element takes 4 bytes, thus numCoordsX must be a multiple of 16. Which doesn't mean it has to be any multiple of numCoords. But again, just as commented earlier by someone else, you should tell us the expected magnitude of numCoords and all the relevant variables. Eg. what is the maximum clusterMemberCount, what is the maximum activeCluster and so on.

– Aki Suihkonen
Nov 18 '18 at 18:17

There is a race condition in this code when writing at clustersCentroID. It should be somehow protected, the same way the increment of clusterMemberCounthas been protected against race conditions. As long as two threads can be processing points belonging to the same activeCluster, writing to an array indexed by activeCluster is not safe.

– Brice
Nov 19 '18 at 9:57

add a comment |

Then, assuming 64-byte cache line, numCoordsX = numCoods*64 ?

– JuMoGar
Nov 18 '18 at 17:27

float *clusterCentroID means, that each element takes 4 bytes, thus numCoordsX must be a multiple of 16. Which doesn't mean it has to be any multiple of numCoords. But again, just as commented earlier by someone else, you should tell us the expected magnitude of numCoords and all the relevant variables. Eg. what is the maximum clusterMemberCount, what is the maximum activeCluster and so on.

– Aki Suihkonen
Nov 18 '18 at 18:17

There is a race condition in this code when writing at clustersCentroID. It should be somehow protected, the same way the increment of clusterMemberCounthas been protected against race conditions. As long as two threads can be processing points belonging to the same activeCluster, writing to an array indexed by activeCluster is not safe.

– Brice
Nov 19 '18 at 9:57

Then, assuming 64-byte cache line, numCoordsX = numCoods*64 ?

– JuMoGar
Nov 18 '18 at 17:27

float *clusterCentroID means, that each element takes 4 bytes, thus numCoordsX must be a multiple of 16. Which doesn't mean it has to be any multiple of numCoords. But again, just as commented earlier by someone else, you should tell us the expected magnitude of numCoords and all the relevant variables. Eg. what is the maximum clusterMemberCount, what is the maximum activeCluster and so on.

– Aki Suihkonen
Nov 18 '18 at 18:17

There is a race condition in this code when writing at clustersCentroID. It should be somehow protected, the same way the increment of clusterMemberCounthas been protected against race conditions. As long as two threads can be processing points belonging to the same activeCluster, writing to an array indexed by activeCluster is not safe.

– Brice
Nov 19 '18 at 9:57

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk