Are cache misses correlated to whether we use heap memory?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















To achieve best performance, try to minimize cache misses. I guess we can all agree on that.



What I suggest and would like to ask about is the following. I say that this:



template <typename T>
struct {
T* a;
T* b;
T* c;
};


is more vulnerable to cache misses than this:



template <typename T>
struct {
T a;
T b;
T c;
};


Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?



Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.



I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?










share|improve this question























  • If it is good to keep data together can't be decided without knowing if it will be used together.

    – Swordfish
    Nov 24 '18 at 12:35


















1















To achieve best performance, try to minimize cache misses. I guess we can all agree on that.



What I suggest and would like to ask about is the following. I say that this:



template <typename T>
struct {
T* a;
T* b;
T* c;
};


is more vulnerable to cache misses than this:



template <typename T>
struct {
T a;
T b;
T c;
};


Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?



Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.



I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?










share|improve this question























  • If it is good to keep data together can't be decided without knowing if it will be used together.

    – Swordfish
    Nov 24 '18 at 12:35














1












1








1








To achieve best performance, try to minimize cache misses. I guess we can all agree on that.



What I suggest and would like to ask about is the following. I say that this:



template <typename T>
struct {
T* a;
T* b;
T* c;
};


is more vulnerable to cache misses than this:



template <typename T>
struct {
T a;
T b;
T c;
};


Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?



Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.



I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?










share|improve this question














To achieve best performance, try to minimize cache misses. I guess we can all agree on that.



What I suggest and would like to ask about is the following. I say that this:



template <typename T>
struct {
T* a;
T* b;
T* c;
};


is more vulnerable to cache misses than this:



template <typename T>
struct {
T a;
T b;
T c;
};


Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?



Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.



I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?







c++ c performance caching memory






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 24 '18 at 12:28









The Quantum PhysicistThe Quantum Physicist

12.1k750105




12.1k750105













  • If it is good to keep data together can't be decided without knowing if it will be used together.

    – Swordfish
    Nov 24 '18 at 12:35



















  • If it is good to keep data together can't be decided without knowing if it will be used together.

    – Swordfish
    Nov 24 '18 at 12:35

















If it is good to keep data together can't be decided without knowing if it will be used together.

– Swordfish
Nov 24 '18 at 12:35





If it is good to keep data together can't be decided without knowing if it will be used together.

– Swordfish
Nov 24 '18 at 12:35












1 Answer
1






active

oldest

votes


















2














On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.



The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).



So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.



The 3 ground rules are locality, locality, locality.



Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).






share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458138%2fare-cache-misses-correlated-to-whether-we-use-heap-memory%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.



    The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).



    So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.



    The 3 ground rules are locality, locality, locality.



    Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).






    share|improve this answer






























      2














      On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.



      The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).



      So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.



      The 3 ground rules are locality, locality, locality.



      Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).






      share|improve this answer




























        2












        2








        2







        On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.



        The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).



        So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.



        The 3 ground rules are locality, locality, locality.



        Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).






        share|improve this answer















        On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.



        The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).



        So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.



        The 3 ground rules are locality, locality, locality.



        Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 24 '18 at 13:25

























        answered Nov 24 '18 at 12:35









        Matthieu BrucherMatthieu Brucher

        17.7k52445




        17.7k52445
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458138%2fare-cache-misses-correlated-to-whether-we-use-heap-memory%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Academy of Television Arts & Sciences

            L'Équipe

            1995 France bombings