Performance cost: No loop fusion across function barriers
up vote
1
down vote
favorite
For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(ar)
return(ar .+ 1.)
end
function inplace!(ar)
ar .= add1(ar)
end
ar1 = rand(10000)
ar2 = ar1[:]
@time inplaceadd1!(ar2)
#0.000010 seconds (4 allocations: 160 bytes)
@time inplace!(ar1)
#0.000026 seconds (6 allocations: 78.359 KiB)
Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?
Appreciate the insight and any recommendations.
julia-lang
add a comment |
up vote
1
down vote
favorite
For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(ar)
return(ar .+ 1.)
end
function inplace!(ar)
ar .= add1(ar)
end
ar1 = rand(10000)
ar2 = ar1[:]
@time inplaceadd1!(ar2)
#0.000010 seconds (4 allocations: 160 bytes)
@time inplace!(ar1)
#0.000026 seconds (6 allocations: 78.359 KiB)
Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?
Appreciate the insight and any recommendations.
julia-lang
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(ar)
return(ar .+ 1.)
end
function inplace!(ar)
ar .= add1(ar)
end
ar1 = rand(10000)
ar2 = ar1[:]
@time inplaceadd1!(ar2)
#0.000010 seconds (4 allocations: 160 bytes)
@time inplace!(ar1)
#0.000026 seconds (6 allocations: 78.359 KiB)
Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?
Appreciate the insight and any recommendations.
julia-lang
For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(ar)
return(ar .+ 1.)
end
function inplace!(ar)
ar .= add1(ar)
end
ar1 = rand(10000)
ar2 = ar1[:]
@time inplaceadd1!(ar2)
#0.000010 seconds (4 allocations: 160 bytes)
@time inplace!(ar1)
#0.000026 seconds (6 allocations: 78.359 KiB)
Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?
Appreciate the insight and any recommendations.
julia-lang
julia-lang
asked Nov 5 at 3:33
arch1190
63
63
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.
You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(a)
a + 1. # no `.+` here
end
function inplace!(ar)
ar .= add1.(ar)
end
Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)
@btime inplaceadd1!($ar2)
# 1.198 μs (0 allocations: 0 bytes)
@btime inplace!($ar1)
# 1.155 μs (0 allocations: 0 bytes)
Ah, smart. Thanks, just what I was looking for.
– arch1190
Nov 5 at 4:52
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.
You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(a)
a + 1. # no `.+` here
end
function inplace!(ar)
ar .= add1.(ar)
end
Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)
@btime inplaceadd1!($ar2)
# 1.198 μs (0 allocations: 0 bytes)
@btime inplace!($ar1)
# 1.155 μs (0 allocations: 0 bytes)
Ah, smart. Thanks, just what I was looking for.
– arch1190
Nov 5 at 4:52
add a comment |
up vote
1
down vote
It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.
You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(a)
a + 1. # no `.+` here
end
function inplace!(ar)
ar .= add1.(ar)
end
Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)
@btime inplaceadd1!($ar2)
# 1.198 μs (0 allocations: 0 bytes)
@btime inplace!($ar1)
# 1.155 μs (0 allocations: 0 bytes)
Ah, smart. Thanks, just what I was looking for.
– arch1190
Nov 5 at 4:52
add a comment |
up vote
1
down vote
up vote
1
down vote
It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.
You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(a)
a + 1. # no `.+` here
end
function inplace!(ar)
ar .= add1.(ar)
end
Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)
@btime inplaceadd1!($ar2)
# 1.198 μs (0 allocations: 0 bytes)
@btime inplace!($ar1)
# 1.155 μs (0 allocations: 0 bytes)
It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.
You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.
function inplaceadd1!(ar)
ar .= ar .+ 1.
end
function add1(a)
a + 1. # no `.+` here
end
function inplace!(ar)
ar .= add1.(ar)
end
Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)
@btime inplaceadd1!($ar2)
# 1.198 μs (0 allocations: 0 bytes)
@btime inplace!($ar1)
# 1.155 μs (0 allocations: 0 bytes)
edited Nov 5 at 4:52
answered Nov 5 at 4:37
hckr
1,359718
1,359718
Ah, smart. Thanks, just what I was looking for.
– arch1190
Nov 5 at 4:52
add a comment |
Ah, smart. Thanks, just what I was looking for.
– arch1190
Nov 5 at 4:52
Ah, smart. Thanks, just what I was looking for.
– arch1190
Nov 5 at 4:52
Ah, smart. Thanks, just what I was looking for.
– arch1190
Nov 5 at 4:52
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147964%2fperformance-cost-no-loop-fusion-across-function-barriers%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password