LockFree Queue with Gcc builtins












0















I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.



The queue looks like this:



typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;


OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.



The Linux process which is the writer does like this:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


The Linux process which is the reader does like this:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


My question is do you see any problems with such an approach like races & stuff?



That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory") right before the atomic store in the writer code. The reader got correct data.



To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).



So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?



Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.



Thanks in advance.



LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):



typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;


Writer code:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


Reader code:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


This should work ... right?!










share|improve this question

























  • Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE), instead of the void version that takes a destination pointer.

    – Peter Cordes
    Nov 24 '18 at 1:47













  • I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the memcpy to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile anyway.

    – Peter Cordes
    Nov 24 '18 at 1:58











  • The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between pointer_to_real_data = ... and actually dereferencing that pointer.

    – Peter Cordes
    Nov 24 '18 at 2:00











  • @PeterCordes what do you mean with my lock free logic doesn’t work?

    – user3523954
    Nov 24 '18 at 6:44











  • @PeterCordes agree! Shall fix that! Thanks!

    – user3523954
    Nov 24 '18 at 6:44
















0















I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.



The queue looks like this:



typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;


OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.



The Linux process which is the writer does like this:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


The Linux process which is the reader does like this:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


My question is do you see any problems with such an approach like races & stuff?



That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory") right before the atomic store in the writer code. The reader got correct data.



To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).



So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?



Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.



Thanks in advance.



LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):



typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;


Writer code:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


Reader code:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


This should work ... right?!










share|improve this question

























  • Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE), instead of the void version that takes a destination pointer.

    – Peter Cordes
    Nov 24 '18 at 1:47













  • I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the memcpy to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile anyway.

    – Peter Cordes
    Nov 24 '18 at 1:58











  • The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between pointer_to_real_data = ... and actually dereferencing that pointer.

    – Peter Cordes
    Nov 24 '18 at 2:00











  • @PeterCordes what do you mean with my lock free logic doesn’t work?

    – user3523954
    Nov 24 '18 at 6:44











  • @PeterCordes agree! Shall fix that! Thanks!

    – user3523954
    Nov 24 '18 at 6:44














0












0








0








I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.



The queue looks like this:



typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;


OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.



The Linux process which is the writer does like this:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


The Linux process which is the reader does like this:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


My question is do you see any problems with such an approach like races & stuff?



That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory") right before the atomic store in the writer code. The reader got correct data.



To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).



So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?



Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.



Thanks in advance.



LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):



typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;


Writer code:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


Reader code:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


This should work ... right?!










share|improve this question
















I am working on developing a single write/reader lock_free queue that is placed on a shared memory which is going to be opened by two different Linux processes. Both processes open the shm with MAP_SHARED.



The queue looks like this:



typedef stuct {
unsigned char Data[ 256 ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100 ];
} Queue_Type;


OBS: Both Linux processes open the same shared memory and view it as a pointer to the Queue_Type. Let us say that the pointer is Shm_p.



The Linux process which is the writer does like this:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
memcpy(&Shm_p->Elems[ Tmp_Snd_Cnt ].Data[ 0 ], pointer_to_real_data, size_of_real_data <= 256);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


The Linux process which is the reader does like this:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
pointer_to_real_data = &Shm_p->Elems[ Tmp_Rcv_Cnt ].Data[ 0 ];
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


My question is do you see any problems with such an approach like races & stuff?



That what I have seen is the following:
Sometimes the reader gets pointer_to_real_data to point to un-updated data. This suggests that GCC has placed the memcpy after the atomic_store (i.e. compile-time instruction reordering). To mitigate this, I have placed asm volatile("" : : : "memory") right before the atomic store in the writer code. The reader got correct data.



To make things even stranger I took away the asm volatile and compiled again.
The code started to work. (I have cleared the shm before testing again).



So obviously the compiler did something strange in the first case without the asm volatile when the reader got un-updated data. What could have been?



Now, in order to be safe, I have the asm volatile there in order to instruct the compiler to not reorder the code.



Thanks in advance.



LATER EDIT (in order to mitigate the problem with over-writing the pointer_to_real_data in the case when the reader gets slow after the store of Rcv_Cnt):



typedef stuct {
unsigned char Data[ 256u ];
} Base_Element_Type;
typedef stuct {
unsigned int Idx;
Base_Element_Type Base_Elems[ 2u ];
} Element_Type;
typedef struct {
unsigned int Snd_Cnt;
unsigned int Rcv_Cnt;
Element_Type Elems[ 100u ];
} Queue_Type;


Writer code:



Tmp_Snd_Cnt = (Shm_p->Snd_Cnt + 1u) % 100u;
__atomic_load(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Snd_Cnt ];
memcpy(&Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ], pointer_to_real_data, size_of_real_data <= 256u);
__atomic_store(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_RELEASE);
}


Reader code:



Tmp_Rcv_Cnt = Shm_p->Rcv_Cnt;
__atomic_load(&Shm_p>Snd_Cnt, &Tmp_Snd_Cnt, __ATOMIC_ACQUIRE);
if (Tmp_Snd_Cnt != Tmp_Rcv_Cnt) {
Elem_p = &Shm_p->Elems[ Tmp_Rcv_Cnt ];
pointer_to_real_data = &Elem_p->Base_Elems[ Elem_p->Idx ].Data[ 0u ];
Elem_p->Idx = (Elem_p->Idx + 1u) % 2u;
Tmp_Rcv_Cnt = (Tmp_Rcv_Cnt + 1u) % 100u;
__atomic_store(&Shm_p>Rcv_Cnt, &Tmp_Rcv_Cnt, __ATOMIC_RELEASE);
}


This should work ... right?!







c gcc queue atomic lock-free






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 24 '18 at 10:22







user3523954

















asked Nov 23 '18 at 11:46









user3523954user3523954

295




295













  • Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE), instead of the void version that takes a destination pointer.

    – Peter Cordes
    Nov 24 '18 at 1:47













  • I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the memcpy to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile anyway.

    – Peter Cordes
    Nov 24 '18 at 1:58











  • The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between pointer_to_real_data = ... and actually dereferencing that pointer.

    – Peter Cordes
    Nov 24 '18 at 2:00











  • @PeterCordes what do you mean with my lock free logic doesn’t work?

    – user3523954
    Nov 24 '18 at 6:44











  • @PeterCordes agree! Shall fix that! Thanks!

    – user3523954
    Nov 24 '18 at 6:44



















  • Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE), instead of the void version that takes a destination pointer.

    – Peter Cordes
    Nov 24 '18 at 1:47













  • I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the memcpy to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile anyway.

    – Peter Cordes
    Nov 24 '18 at 1:58











  • The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between pointer_to_real_data = ... and actually dereferencing that pointer.

    – Peter Cordes
    Nov 24 '18 at 2:00











  • @PeterCordes what do you mean with my lock free logic doesn’t work?

    – user3523954
    Nov 24 '18 at 6:44











  • @PeterCordes agree! Shall fix that! Thanks!

    – user3523954
    Nov 24 '18 at 6:44

















Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE), instead of the void version that takes a destination pointer.

– Peter Cordes
Nov 24 '18 at 1:47







Usually you'd use a power-of-2 size to make the modulo operation even cheaper. At least your size is a compile-time constant so it can use a multiplicative inverse. And BTW, your code would be more readable if you used Tmp_Rcv_Cnt = __atomic_load_n(&Shm_p>Snd_Cnt, __ATOMIC_ACQUIRE), instead of the void version that takes a destination pointer.

– Peter Cordes
Nov 24 '18 at 1:47















I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the memcpy to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile anyway.

– Peter Cordes
Nov 24 '18 at 1:58





I'm not confident your lockfree logic works. The compiler shouldn't be able to reorder the memcpy to after a release-store, but you could have checked that by looking at the asm (or posting it here), if you still have the executable that didn't seem to work. I assume you compiled with optimization enabled? Otherwise everything is effectively volatile anyway.

– Peter Cordes
Nov 24 '18 at 1:58













The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between pointer_to_real_data = ... and actually dereferencing that pointer.

– Peter Cordes
Nov 24 '18 at 2:00





The reader doesn't copy the data out of the queue before marking the entry as read, so you have possible re-use of the slot, don't you? If the reader blocks and the writer wraps all the way around and overwrites the data between pointer_to_real_data = ... and actually dereferencing that pointer.

– Peter Cordes
Nov 24 '18 at 2:00













@PeterCordes what do you mean with my lock free logic doesn’t work?

– user3523954
Nov 24 '18 at 6:44





@PeterCordes what do you mean with my lock free logic doesn’t work?

– user3523954
Nov 24 '18 at 6:44













@PeterCordes agree! Shall fix that! Thanks!

– user3523954
Nov 24 '18 at 6:44





@PeterCordes agree! Shall fix that! Thanks!

– user3523954
Nov 24 '18 at 6:44












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53446127%2flockfree-queue-with-gcc-builtins%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53446127%2flockfree-queue-with-gcc-builtins%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Xamarin.form Move up view when keyboard appear

Post-Redirect-Get with Spring WebFlux and Thymeleaf

Anylogic : not able to use stopDelay()