LockFree Queue with Gcc builtins
I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.
The queue looks like this:
typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;
OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.
The Linux process which is the writer does like this:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
The Linux process which is the reader does like this:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
My question is do you see any problems with such an approach like races & stuff?
That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory")
right before the atomic store in the writer code. The reader got correct data.
To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).
So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?
Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.
Thanks in advance.
LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):
typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;
Writer code:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
Reader code:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
This should work ... right?!
c gcc queue atomic lock-free
add a comment |
I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.
The queue looks like this:
typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;
OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.
The Linux process which is the writer does like this:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
The Linux process which is the reader does like this:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
My question is do you see any problems with such an approach like races & stuff?
That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory")
right before the atomic store in the writer code. The reader got correct data.
To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).
So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?
Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.
Thanks in advance.
LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):
typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;
Writer code:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
Reader code:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
This should work ... right?!
c gcc queue atomic lock-free
Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you usedTmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE)
, instead of thevoid
version that takes a destination pointer.
– Peter Cordes
Nov 24 '18 at 1:47
I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder thememcpy
to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectivelyvolatile
anyway.
– Peter Cordes
Nov 24 '18 at 1:58
The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data betweenpointer_to_real_data = ...
and actually dereferencing that pointer.
– Peter Cordes
Nov 24 '18 at 2:00
@PeterCordes what do you mean with my lock free logic doesn’t work?
– user3523954
Nov 24 '18 at 6:44
@PeterCordes agree! Shall fix that! Thanks!
– user3523954
Nov 24 '18 at 6:44
add a comment |
I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.
The queue looks like this:
typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;
OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.
The Linux process which is the writer does like this:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
The Linux process which is the reader does like this:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
My question is do you see any problems with such an approach like races & stuff?
That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory")
right before the atomic store in the writer code. The reader got correct data.
To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).
So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?
Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.
Thanks in advance.
LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):
typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;
Writer code:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
Reader code:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
This should work ... right?!
c gcc queue atomic lock-free
I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.
The queue looks like this:
typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;
OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.
The Linux process which is the writer does like this:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
The Linux process which is the reader does like this:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
My question is do you see any problems with such an approach like races & stuff?
That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory")
right before the atomic store in the writer code. The reader got correct data.
To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).
So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?
Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.
Thanks in advance.
LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):
typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;
Writer code:
Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}
Reader code:
Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}
This should work ... right?!
c gcc queue atomic lock-free
c gcc queue atomic lock-free
edited Nov 24 '18 at 10:22
user3523954
asked Nov 23 '18 at 11:46
user3523954user3523954
295
295
Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you usedTmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE)
, instead of thevoid
version that takes a destination pointer.
– Peter Cordes
Nov 24 '18 at 1:47
I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder thememcpy
to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectivelyvolatile
anyway.
– Peter Cordes
Nov 24 '18 at 1:58
The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data betweenpointer_to_real_data = ...
and actually dereferencing that pointer.
– Peter Cordes
Nov 24 '18 at 2:00
@PeterCordes what do you mean with my lock free logic doesn’t work?
– user3523954
Nov 24 '18 at 6:44
@PeterCordes agree! Shall fix that! Thanks!
– user3523954
Nov 24 '18 at 6:44
add a comment |
Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you usedTmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE)
, instead of thevoid
version that takes a destination pointer.
– Peter Cordes
Nov 24 '18 at 1:47
I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder thememcpy
to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectivelyvolatile
anyway.
– Peter Cordes
Nov 24 '18 at 1:58
The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data betweenpointer_to_real_data = ...
and actually dereferencing that pointer.
– Peter Cordes
Nov 24 '18 at 2:00
@PeterCordes what do you mean with my lock free logic doesn’t work?
– user3523954
Nov 24 '18 at 6:44
@PeterCordes agree! Shall fix that! Thanks!
– user3523954
Nov 24 '18 at 6:44
Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used
Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE)
, instead of the void
version that takes a destination pointer.– Peter Cordes
Nov 24 '18 at 1:47
Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used
Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE)
, instead of the void
version that takes a destination pointer.– Peter Cordes
Nov 24 '18 at 1:47
I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the
memcpy
to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile
anyway.– Peter Cordes
Nov 24 '18 at 1:58
I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the
memcpy
to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile
anyway.– Peter Cordes
Nov 24 '18 at 1:58
The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between
pointer_to_real_data = ...
and actually dereferencing that pointer.– Peter Cordes
Nov 24 '18 at 2:00
The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between
pointer_to_real_data = ...
and actually dereferencing that pointer.– Peter Cordes
Nov 24 '18 at 2:00
@PeterCordes what do you mean with my lock free logic doesn’t work?
– user3523954
Nov 24 '18 at 6:44
@PeterCordes what do you mean with my lock free logic doesn’t work?
– user3523954
Nov 24 '18 at 6:44
@PeterCordes agree! Shall fix that! Thanks!
– user3523954
Nov 24 '18 at 6:44
@PeterCordes agree! Shall fix that! Thanks!
– user3523954
Nov 24 '18 at 6:44
add a comment |
0
active
oldest
votes
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53446127%2flockfree-queue-with-gcc-builtins%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53446127%2flockfree-queue-with-gcc-builtins%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used
Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE)
, instead of thevoid
version that takes a destination pointer.– Peter Cordes
Nov 24 '18 at 1:47
I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the
memcpy
to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectivelyvolatile
anyway.– Peter Cordes
Nov 24 '18 at 1:58
The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between
pointer_to_real_data = ...
and actually dereferencing that pointer.– Peter Cordes
Nov 24 '18 at 2:00
@PeterCordes what do you mean with my lock free logic doesn’t work?
– user3523954
Nov 24 '18 at 6:44
@PeterCordes agree! Shall fix that! Thanks!
– user3523954
Nov 24 '18 at 6:44