Performance cost: No loop fusion across function barriers











up vote
1
down vote

favorite












For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?



function inplaceadd1!(ar)
ar .= ar .+ 1.
end

function add1(ar)
return(ar .+ 1.)
end

function inplace!(ar)
ar .= add1(ar)
end

ar1 = rand(10000)
ar2 = ar1[:]

@time inplaceadd1!(ar2)
#0.000010 seconds (4 allocations: 160 bytes)
@time inplace!(ar1)
#0.000026 seconds (6 allocations: 78.359 KiB)


Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?



Appreciate the insight and any recommendations.










share|improve this question


























    up vote
    1
    down vote

    favorite












    For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?



    function inplaceadd1!(ar)
    ar .= ar .+ 1.
    end

    function add1(ar)
    return(ar .+ 1.)
    end

    function inplace!(ar)
    ar .= add1(ar)
    end

    ar1 = rand(10000)
    ar2 = ar1[:]

    @time inplaceadd1!(ar2)
    #0.000010 seconds (4 allocations: 160 bytes)
    @time inplace!(ar1)
    #0.000026 seconds (6 allocations: 78.359 KiB)


    Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?



    Appreciate the insight and any recommendations.










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?



      function inplaceadd1!(ar)
      ar .= ar .+ 1.
      end

      function add1(ar)
      return(ar .+ 1.)
      end

      function inplace!(ar)
      ar .= add1(ar)
      end

      ar1 = rand(10000)
      ar2 = ar1[:]

      @time inplaceadd1!(ar2)
      #0.000010 seconds (4 allocations: 160 bytes)
      @time inplace!(ar1)
      #0.000026 seconds (6 allocations: 78.359 KiB)


      Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?



      Appreciate the insight and any recommendations.










      share|improve this question













      For style and performance considerations, I found myself comparing the following two functions. Is it possible to get equivalent performance between the following two ways to add 1 to every element in an array?



      function inplaceadd1!(ar)
      ar .= ar .+ 1.
      end

      function add1(ar)
      return(ar .+ 1.)
      end

      function inplace!(ar)
      ar .= add1(ar)
      end

      ar1 = rand(10000)
      ar2 = ar1[:]

      @time inplaceadd1!(ar2)
      #0.000010 seconds (4 allocations: 160 bytes)
      @time inplace!(ar1)
      #0.000026 seconds (6 allocations: 78.359 KiB)


      Not knowing too much about compiler optimizations, to me it seems that add1 could be inlined into inplace! and the loop could be fused to achieve identical performance without extra allocations. Does this not occur?



      Appreciate the insight and any recommendations.







      julia-lang






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 5 at 3:33









      arch1190

      63




      63
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.



          You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.



          function inplaceadd1!(ar)
          ar .= ar .+ 1.
          end

          function add1(a)
          a + 1. # no `.+` here
          end

          function inplace!(ar)
          ar .= add1.(ar)
          end


          Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)



          @btime inplaceadd1!($ar2)
          # 1.198 μs (0 allocations: 0 bytes)
          @btime inplace!($ar1)
          # 1.155 μs (0 allocations: 0 bytes)





          share|improve this answer























          • Ah, smart. Thanks, just what I was looking for.
            – arch1190
            Nov 5 at 4:52











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147964%2fperformance-cost-no-loop-fusion-across-function-barriers%23new-answer', 'question_page');
          }
          );

          Post as a guest
































          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.



          You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.



          function inplaceadd1!(ar)
          ar .= ar .+ 1.
          end

          function add1(a)
          a + 1. # no `.+` here
          end

          function inplace!(ar)
          ar .= add1.(ar)
          end


          Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)



          @btime inplaceadd1!($ar2)
          # 1.198 μs (0 allocations: 0 bytes)
          @btime inplace!($ar1)
          # 1.155 μs (0 allocations: 0 bytes)





          share|improve this answer























          • Ah, smart. Thanks, just what I was looking for.
            – arch1190
            Nov 5 at 4:52















          up vote
          1
          down vote













          It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.



          You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.



          function inplaceadd1!(ar)
          ar .= ar .+ 1.
          end

          function add1(a)
          a + 1. # no `.+` here
          end

          function inplace!(ar)
          ar .= add1.(ar)
          end


          Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)



          @btime inplaceadd1!($ar2)
          # 1.198 μs (0 allocations: 0 bytes)
          @btime inplace!($ar1)
          # 1.155 μs (0 allocations: 0 bytes)





          share|improve this answer























          • Ah, smart. Thanks, just what I was looking for.
            – arch1190
            Nov 5 at 4:52













          up vote
          1
          down vote










          up vote
          1
          down vote









          It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.



          You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.



          function inplaceadd1!(ar)
          ar .= ar .+ 1.
          end

          function add1(a)
          a + 1. # no `.+` here
          end

          function inplace!(ar)
          ar .= add1.(ar)
          end


          Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)



          @btime inplaceadd1!($ar2)
          # 1.198 μs (0 allocations: 0 bytes)
          @btime inplace!($ar1)
          # 1.155 μs (0 allocations: 0 bytes)





          share|improve this answer














          It does not occur in your case. add1 normally returns a new array and the compiler is not able to figure out the new array is not necessary at all. Note that ! is used for style purposes and does not mean anything special to the compiler at the moment.



          You should instead write your function element-wise and let the loop fusion do its work. This is a more Julia way if you are defining element-wise operations.



          function inplaceadd1!(ar)
          ar .= ar .+ 1.
          end

          function add1(a)
          a + 1. # no `.+` here
          end

          function inplace!(ar)
          ar .= add1.(ar)
          end


          Since it is a small function, it should automatically get inlined by the compiler. You can also give a hint to the compiler by using @inline macro (annotate your function with @inline.)



          @btime inplaceadd1!($ar2)
          # 1.198 μs (0 allocations: 0 bytes)
          @btime inplace!($ar1)
          # 1.155 μs (0 allocations: 0 bytes)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 5 at 4:52

























          answered Nov 5 at 4:37









          hckr

          1,359718




          1,359718












          • Ah, smart. Thanks, just what I was looking for.
            – arch1190
            Nov 5 at 4:52


















          • Ah, smart. Thanks, just what I was looking for.
            – arch1190
            Nov 5 at 4:52
















          Ah, smart. Thanks, just what I was looking for.
          – arch1190
          Nov 5 at 4:52




          Ah, smart. Thanks, just what I was looking for.
          – arch1190
          Nov 5 at 4:52


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147964%2fperformance-cost-no-loop-fusion-across-function-barriers%23new-answer', 'question_page');
          }
          );

          Post as a guest




















































































          這個網誌中的熱門文章

          Academy of Television Arts & Sciences

          L'Équipe

          1995 France bombings