Are cache misses correlated to whether we use heap memory?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
To achieve best performance, try to minimize cache misses. I guess we can all agree on that.
What I suggest and would like to ask about is the following. I say that this:
template <typename T>
struct {
T* a;
T* b;
T* c;
};
is more vulnerable to cache misses than this:
template <typename T>
struct {
T a;
T b;
T c;
};
Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?
Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.
I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?
c++ c performance caching memory
add a comment |
To achieve best performance, try to minimize cache misses. I guess we can all agree on that.
What I suggest and would like to ask about is the following. I say that this:
template <typename T>
struct {
T* a;
T* b;
T* c;
};
is more vulnerable to cache misses than this:
template <typename T>
struct {
T a;
T b;
T c;
};
Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?
Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.
I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?
c++ c performance caching memory
If it is good to keep data together can't be decided without knowing if it will be used together.
– Swordfish
Nov 24 '18 at 12:35
add a comment |
To achieve best performance, try to minimize cache misses. I guess we can all agree on that.
What I suggest and would like to ask about is the following. I say that this:
template <typename T>
struct {
T* a;
T* b;
T* c;
};
is more vulnerable to cache misses than this:
template <typename T>
struct {
T a;
T b;
T c;
};
Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?
Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.
I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?
c++ c performance caching memory
To achieve best performance, try to minimize cache misses. I guess we can all agree on that.
What I suggest and would like to ask about is the following. I say that this:
template <typename T>
struct {
T* a;
T* b;
T* c;
};
is more vulnerable to cache misses than this:
template <typename T>
struct {
T a;
T b;
T c;
};
Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?
Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.
I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?
c++ c performance caching memory
c++ c performance caching memory
asked Nov 24 '18 at 12:28
The Quantum PhysicistThe Quantum Physicist
12.1k750105
12.1k750105
If it is good to keep data together can't be decided without knowing if it will be used together.
– Swordfish
Nov 24 '18 at 12:35
add a comment |
If it is good to keep data together can't be decided without knowing if it will be used together.
– Swordfish
Nov 24 '18 at 12:35
If it is good to keep data together can't be decided without knowing if it will be used together.
– Swordfish
Nov 24 '18 at 12:35
If it is good to keep data together can't be decided without knowing if it will be used together.
– Swordfish
Nov 24 '18 at 12:35
add a comment |
1 Answer
1
active
oldest
votes
On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.
The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).
So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.
The 3 ground rules are locality, locality, locality.
Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458138%2fare-cache-misses-correlated-to-whether-we-use-heap-memory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.
The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).
So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.
The 3 ground rules are locality, locality, locality.
Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).
add a comment |
On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.
The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).
So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.
The 3 ground rules are locality, locality, locality.
Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).
add a comment |
On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.
The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).
So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.
The 3 ground rules are locality, locality, locality.
Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).
On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.
The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).
So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.
The 3 ground rules are locality, locality, locality.
Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).
edited Nov 24 '18 at 13:25
answered Nov 24 '18 at 12:35
Matthieu BrucherMatthieu Brucher
17.7k52445
17.7k52445
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458138%2fare-cache-misses-correlated-to-whether-we-use-heap-memory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If it is good to keep data together can't be decided without knowing if it will be used together.
– Swordfish
Nov 24 '18 at 12:35