gcc auto vectorization control flow in loop
up vote
6
down vote
favorite
In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says:
note: not vectorized: control flow in loop.
I am using gcc 8.2, flags are -O3 -fopt-info-vec-all. I am compiling for x86-64 avx2.
#include <stdlib.h>
#include <math.h>
void foo(const float * x, const float * y, const int * v, float * vec, float * novec, size_t size) {
size_t i;
float bar;
for (i=0 ; i<size ; ++i){
bar = x[i] - y[i];
novec[i] = v[i] ? bar : NAN;
}
for (i=0 ; i<size ; ++i){
bar = x[i];
vec[i] = v[i] ? bar : NAN;
}
}
Update:
This does autovectorize:
for (i=0 ; i<size ; ++i){
bar = x[i];
novec[i] = v[i] ? bar : NAN;
novec[i] -= y[i];
}
I would still like to know why gcc says control flow for the first loop.
c gcc avx2 auto-vectorization
add a comment |
up vote
6
down vote
favorite
In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says:
note: not vectorized: control flow in loop.
I am using gcc 8.2, flags are -O3 -fopt-info-vec-all. I am compiling for x86-64 avx2.
#include <stdlib.h>
#include <math.h>
void foo(const float * x, const float * y, const int * v, float * vec, float * novec, size_t size) {
size_t i;
float bar;
for (i=0 ; i<size ; ++i){
bar = x[i] - y[i];
novec[i] = v[i] ? bar : NAN;
}
for (i=0 ; i<size ; ++i){
bar = x[i];
vec[i] = v[i] ? bar : NAN;
}
}
Update:
This does autovectorize:
for (i=0 ; i<size ; ++i){
bar = x[i];
novec[i] = v[i] ? bar : NAN;
novec[i] -= y[i];
}
I would still like to know why gcc says control flow for the first loop.
c gcc avx2 auto-vectorization
Vectorizes for me.
– EOF
Nov 8 at 18:31
@EOF: clang vectorizes it the way you'd expect, but gcc8.2 doesn't. (Even withrestrict
added to all the pointers, and with-march=haswell
) godbolt.org/z/cnlwuO.
– Peter Cordes
Nov 9 at 0:48
add a comment |
up vote
6
down vote
favorite
up vote
6
down vote
favorite
In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says:
note: not vectorized: control flow in loop.
I am using gcc 8.2, flags are -O3 -fopt-info-vec-all. I am compiling for x86-64 avx2.
#include <stdlib.h>
#include <math.h>
void foo(const float * x, const float * y, const int * v, float * vec, float * novec, size_t size) {
size_t i;
float bar;
for (i=0 ; i<size ; ++i){
bar = x[i] - y[i];
novec[i] = v[i] ? bar : NAN;
}
for (i=0 ; i<size ; ++i){
bar = x[i];
vec[i] = v[i] ? bar : NAN;
}
}
Update:
This does autovectorize:
for (i=0 ; i<size ; ++i){
bar = x[i];
novec[i] = v[i] ? bar : NAN;
novec[i] -= y[i];
}
I would still like to know why gcc says control flow for the first loop.
c gcc avx2 auto-vectorization
In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says:
note: not vectorized: control flow in loop.
I am using gcc 8.2, flags are -O3 -fopt-info-vec-all. I am compiling for x86-64 avx2.
#include <stdlib.h>
#include <math.h>
void foo(const float * x, const float * y, const int * v, float * vec, float * novec, size_t size) {
size_t i;
float bar;
for (i=0 ; i<size ; ++i){
bar = x[i] - y[i];
novec[i] = v[i] ? bar : NAN;
}
for (i=0 ; i<size ; ++i){
bar = x[i];
vec[i] = v[i] ? bar : NAN;
}
}
Update:
This does autovectorize:
for (i=0 ; i<size ; ++i){
bar = x[i];
novec[i] = v[i] ? bar : NAN;
novec[i] -= y[i];
}
I would still like to know why gcc says control flow for the first loop.
c gcc avx2 auto-vectorization
c gcc avx2 auto-vectorization
edited Nov 8 at 16:48
Mysticial
379k39289298
379k39289298
asked Nov 8 at 14:06
user2133814
582618
582618
Vectorizes for me.
– EOF
Nov 8 at 18:31
@EOF: clang vectorizes it the way you'd expect, but gcc8.2 doesn't. (Even withrestrict
added to all the pointers, and with-march=haswell
) godbolt.org/z/cnlwuO.
– Peter Cordes
Nov 9 at 0:48
add a comment |
Vectorizes for me.
– EOF
Nov 8 at 18:31
@EOF: clang vectorizes it the way you'd expect, but gcc8.2 doesn't. (Even withrestrict
added to all the pointers, and with-march=haswell
) godbolt.org/z/cnlwuO.
– Peter Cordes
Nov 9 at 0:48
Vectorizes for me.
– EOF
Nov 8 at 18:31
Vectorizes for me.
– EOF
Nov 8 at 18:31
@EOF: clang vectorizes it the way you'd expect, but gcc8.2 doesn't. (Even with
restrict
added to all the pointers, and with -march=haswell
) godbolt.org/z/cnlwuO.– Peter Cordes
Nov 9 at 0:48
@EOF: clang vectorizes it the way you'd expect, but gcc8.2 doesn't. (Even with
restrict
added to all the pointers, and with -march=haswell
) godbolt.org/z/cnlwuO.– Peter Cordes
Nov 9 at 0:48
add a comment |
1 Answer
1
active
oldest
votes
up vote
5
down vote
accepted
clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)
gcc vectorizes with -ffast-math
. Perhaps it's worried about preserving FP exception flag status from the subtraction?
-fno-trapping-math
is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math
sets), so apparently it's worried about FP exceptions. (https://godbolt.org/z/804ykV). I think it's being over-cautious, because the C source does compute bar
every time, whether or not it's used.
gcc will auto-vectorize simple FP a[i] = b[i]+c[i]
loops without any FP math options.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)
gcc vectorizes with -ffast-math
. Perhaps it's worried about preserving FP exception flag status from the subtraction?
-fno-trapping-math
is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math
sets), so apparently it's worried about FP exceptions. (https://godbolt.org/z/804ykV). I think it's being over-cautious, because the C source does compute bar
every time, whether or not it's used.
gcc will auto-vectorize simple FP a[i] = b[i]+c[i]
loops without any FP math options.
add a comment |
up vote
5
down vote
accepted
clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)
gcc vectorizes with -ffast-math
. Perhaps it's worried about preserving FP exception flag status from the subtraction?
-fno-trapping-math
is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math
sets), so apparently it's worried about FP exceptions. (https://godbolt.org/z/804ykV). I think it's being over-cautious, because the C source does compute bar
every time, whether or not it's used.
gcc will auto-vectorize simple FP a[i] = b[i]+c[i]
loops without any FP math options.
add a comment |
up vote
5
down vote
accepted
up vote
5
down vote
accepted
clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)
gcc vectorizes with -ffast-math
. Perhaps it's worried about preserving FP exception flag status from the subtraction?
-fno-trapping-math
is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math
sets), so apparently it's worried about FP exceptions. (https://godbolt.org/z/804ykV). I think it's being over-cautious, because the C source does compute bar
every time, whether or not it's used.
gcc will auto-vectorize simple FP a[i] = b[i]+c[i]
loops without any FP math options.
clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)
gcc vectorizes with -ffast-math
. Perhaps it's worried about preserving FP exception flag status from the subtraction?
-fno-trapping-math
is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math
sets), so apparently it's worried about FP exceptions. (https://godbolt.org/z/804ykV). I think it's being over-cautious, because the C source does compute bar
every time, whether or not it's used.
gcc will auto-vectorize simple FP a[i] = b[i]+c[i]
loops without any FP math options.
answered Nov 9 at 0:54
Peter Cordes
117k16177304
117k16177304
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53209394%2fgcc-auto-vectorization-control-flow-in-loop%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Vectorizes for me.
– EOF
Nov 8 at 18:31
@EOF: clang vectorizes it the way you'd expect, but gcc8.2 doesn't. (Even with
restrict
added to all the pointers, and with-march=haswell
) godbolt.org/z/cnlwuO.– Peter Cordes
Nov 9 at 0:48