Suggestions on a data structure
I've been struggling with finding a suitable data structure that meets these requirements:
- The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.
- The number of insertions are not frequent.
- Searches are by far more frequent.
- Retrieving the index of the element within the data structure is critical.
- Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.
With this in mind, I have these considerations:
- A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.
- A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.
Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.
A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.
What do you think? Any suggestion would be more than appreciated.
Many thanks
c data-structures
|
show 1 more comment
I've been struggling with finding a suitable data structure that meets these requirements:
- The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.
- The number of insertions are not frequent.
- Searches are by far more frequent.
- Retrieving the index of the element within the data structure is critical.
- Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.
With this in mind, I have these considerations:
- A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.
- A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.
Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.
A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.
What do you think? Any suggestion would be more than appreciated.
Many thanks
c data-structures
I forgot to add that the insertions can be at the end, middle or beginning.
– mgfernan
Nov 13 '18 at 13:55
One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)
– Jean-Marc Zimmer
Nov 13 '18 at 13:59
You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.
– joop
Nov 13 '18 at 14:25
Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?
– Dukeling
Nov 13 '18 at 15:42
@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).
– mgfernan
Nov 14 '18 at 6:28
|
show 1 more comment
I've been struggling with finding a suitable data structure that meets these requirements:
- The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.
- The number of insertions are not frequent.
- Searches are by far more frequent.
- Retrieving the index of the element within the data structure is critical.
- Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.
With this in mind, I have these considerations:
- A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.
- A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.
Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.
A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.
What do you think? Any suggestion would be more than appreciated.
Many thanks
c data-structures
I've been struggling with finding a suitable data structure that meets these requirements:
- The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.
- The number of insertions are not frequent.
- Searches are by far more frequent.
- Retrieving the index of the element within the data structure is critical.
- Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.
With this in mind, I have these considerations:
- A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.
- A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.
Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.
A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.
What do you think? Any suggestion would be more than appreciated.
Many thanks
c data-structures
c data-structures
asked Nov 13 '18 at 13:54
mgfernanmgfernan
4761617
4761617
I forgot to add that the insertions can be at the end, middle or beginning.
– mgfernan
Nov 13 '18 at 13:55
One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)
– Jean-Marc Zimmer
Nov 13 '18 at 13:59
You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.
– joop
Nov 13 '18 at 14:25
Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?
– Dukeling
Nov 13 '18 at 15:42
@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).
– mgfernan
Nov 14 '18 at 6:28
|
show 1 more comment
I forgot to add that the insertions can be at the end, middle or beginning.
– mgfernan
Nov 13 '18 at 13:55
One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)
– Jean-Marc Zimmer
Nov 13 '18 at 13:59
You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.
– joop
Nov 13 '18 at 14:25
Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?
– Dukeling
Nov 13 '18 at 15:42
@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).
– mgfernan
Nov 14 '18 at 6:28
I forgot to add that the insertions can be at the end, middle or beginning.
– mgfernan
Nov 13 '18 at 13:55
I forgot to add that the insertions can be at the end, middle or beginning.
– mgfernan
Nov 13 '18 at 13:55
One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)
– Jean-Marc Zimmer
Nov 13 '18 at 13:59
One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)
– Jean-Marc Zimmer
Nov 13 '18 at 13:59
You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.
– joop
Nov 13 '18 at 14:25
You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.
– joop
Nov 13 '18 at 14:25
Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?
– Dukeling
Nov 13 '18 at 15:42
Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?
– Dukeling
Nov 13 '18 at 15:42
@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).
– mgfernan
Nov 14 '18 at 6:28
@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).
– mgfernan
Nov 14 '18 at 6:28
|
show 1 more comment
3 Answers
3
active
oldest
votes
An array would probably be the best data structure in this case.
Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove
. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1)
retrieval by index and O(log n)
searches.
Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.
So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.
add a comment |
What about a search tree? It's a good fit for all of your requirements except 4.
To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.
Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.
add a comment |
Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.
This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.
Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53282597%2fsuggestions-on-a-data-structure%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
An array would probably be the best data structure in this case.
Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove
. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1)
retrieval by index and O(log n)
searches.
Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.
So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.
add a comment |
An array would probably be the best data structure in this case.
Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove
. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1)
retrieval by index and O(log n)
searches.
Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.
So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.
add a comment |
An array would probably be the best data structure in this case.
Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove
. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1)
retrieval by index and O(log n)
searches.
Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.
So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.
An array would probably be the best data structure in this case.
Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove
. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1)
retrieval by index and O(log n)
searches.
Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.
So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.
answered Nov 13 '18 at 14:27
dbushdbush
93.9k12101134
93.9k12101134
add a comment |
add a comment |
What about a search tree? It's a good fit for all of your requirements except 4.
To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.
Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.
add a comment |
What about a search tree? It's a good fit for all of your requirements except 4.
To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.
Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.
add a comment |
What about a search tree? It's a good fit for all of your requirements except 4.
To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.
Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.
What about a search tree? It's a good fit for all of your requirements except 4.
To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.
Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.
answered Nov 13 '18 at 14:12
kfxkfx
5,02821535
5,02821535
add a comment |
add a comment |
Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.
This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.
Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.
add a comment |
Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.
This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.
Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.
add a comment |
Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.
This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.
Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.
Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.
This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.
Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.
answered Nov 13 '18 at 14:32
LundinLundin
107k17158262
107k17158262
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53282597%2fsuggestions-on-a-data-structure%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I forgot to add that the insertions can be at the end, middle or beginning.
– mgfernan
Nov 13 '18 at 13:55
One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)
– Jean-Marc Zimmer
Nov 13 '18 at 13:59
You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.
– joop
Nov 13 '18 at 14:25
Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?
– Dukeling
Nov 13 '18 at 15:42
@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).
– mgfernan
Nov 14 '18 at 6:28