Suggestions on a data structure

I've been struggling with finding a suitable data structure that meets these requirements:

The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

The number of insertions are not frequent.

Searches are by far more frequent.

Retrieving the index of the element within the data structure is critical.

Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.

With this in mind, I have these considerations:

A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.

Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.

A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.

What do you think? Any suggestion would be more than appreciated.

Many thanks

asked Nov 13 '18 at 13:54

mgfernan

4761617

I forgot to add that the insertions can be at the end, middle or beginning.

– mgfernan
Nov 13 '18 at 13:55

One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

– Jean-Marc Zimmer
Nov 13 '18 at 13:59

You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

– joop
Nov 13 '18 at 14:25

Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

– Dukeling
Nov 13 '18 at 15:42

@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

– mgfernan
Nov 14 '18 at 6:28

|
show 1 more comment

I've been struggling with finding a suitable data structure that meets these requirements:

The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

The number of insertions are not frequent.

Searches are by far more frequent.

Retrieving the index of the element within the data structure is critical.

Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.

With this in mind, I have these considerations:

A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.

Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.

What do you think? Any suggestion would be more than appreciated.

Many thanks

asked Nov 13 '18 at 13:54

mgfernan

4761617

I forgot to add that the insertions can be at the end, middle or beginning.

– mgfernan
Nov 13 '18 at 13:55

One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

– Jean-Marc Zimmer
Nov 13 '18 at 13:59

You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

– joop
Nov 13 '18 at 14:25

Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

– Dukeling
Nov 13 '18 at 15:42

@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

– mgfernan
Nov 14 '18 at 6:28

|
show 1 more comment

I've been struggling with finding a suitable data structure that meets these requirements:

The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

The number of insertions are not frequent.

Searches are by far more frequent.

Retrieving the index of the element within the data structure is critical.

Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.

With this in mind, I have these considerations:

A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.

Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.

What do you think? Any suggestion would be more than appreciated.

Many thanks

asked Nov 13 '18 at 13:54

mgfernan

4761617

I've been struggling with finding a suitable data structure that meets these requirements:

The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

The number of insertions are not frequent.

Searches are by far more frequent.

Retrieving the index of the element within the data structure is critical.

Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.

With this in mind, I have these considerations:

A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.

Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.

What do you think? Any suggestion would be more than appreciated.

Many thanks

c data-structures

asked Nov 13 '18 at 13:54

mgfernan

4761617

asked Nov 13 '18 at 13:54

mgfernan

4761617

asked Nov 13 '18 at 13:54

mgfernan

4761617

asked Nov 13 '18 at 13:54

mgfernan

4761617

asked Nov 13 '18 at 13:54

mgfernan

4761617

I forgot to add that the insertions can be at the end, middle or beginning.

– mgfernan
Nov 13 '18 at 13:55

One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

– Jean-Marc Zimmer
Nov 13 '18 at 13:59

You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

– joop
Nov 13 '18 at 14:25

Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

– Dukeling
Nov 13 '18 at 15:42

@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

– mgfernan
Nov 14 '18 at 6:28

|
show 1 more comment

I forgot to add that the insertions can be at the end, middle or beginning.

– mgfernan
Nov 13 '18 at 13:55

One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

– Jean-Marc Zimmer
Nov 13 '18 at 13:59

You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

– joop
Nov 13 '18 at 14:25

Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

– Dukeling
Nov 13 '18 at 15:42

@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

– mgfernan
Nov 14 '18 at 6:28

I forgot to add that the insertions can be at the end, middle or beginning.

– mgfernan
Nov 13 '18 at 13:55

One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

– Jean-Marc Zimmer
Nov 13 '18 at 13:59

You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

– joop
Nov 13 '18 at 14:25

Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

– Dukeling
Nov 13 '18 at 15:42

@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

– mgfernan
Nov 14 '18 at 6:28

|
show 1 more comment

3 Answers
3

active

oldest

votes

An array would probably be the best data structure in this case.

Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.

Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.

So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

add a comment |

What about a search tree? It's a good fit for all of your requirements except 4.

To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.

Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.

answered Nov 13 '18 at 14:12

kfx

5,02821535

add a comment |

Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.

This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.

Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.

answered Nov 13 '18 at 14:32

Lundin

107k17158262

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53282597%2fsuggestions-on-a-data-structure%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

An array would probably be the best data structure in this case.

Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.

So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

add a comment |

An array would probably be the best data structure in this case.

Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.

So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

add a comment |

An array would probably be the best data structure in this case.

Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.

So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

An array would probably be the best data structure in this case.

Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.

So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

answered Nov 13 '18 at 14:27

dbush

93.9k12101134

add a comment |

What about a search tree? It's a good fit for all of your requirements except 4.

To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.

answered Nov 13 '18 at 14:12

kfx

5,02821535

add a comment |

What about a search tree? It's a good fit for all of your requirements except 4.

To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.

answered Nov 13 '18 at 14:12

kfx

5,02821535

add a comment |

What about a search tree? It's a good fit for all of your requirements except 4.

To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.

answered Nov 13 '18 at 14:12

kfx

5,02821535

What about a search tree? It's a good fit for all of your requirements except 4.

To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.

answered Nov 13 '18 at 14:12

kfx

5,02821535

answered Nov 13 '18 at 14:12

kfx

5,02821535

answered Nov 13 '18 at 14:12

kfx

5,02821535

answered Nov 13 '18 at 14:12

kfx

5,02821535

add a comment |

Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.

answered Nov 13 '18 at 14:32

Lundin

107k17158262

add a comment |

Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.

answered Nov 13 '18 at 14:32

Lundin

107k17158262

add a comment |

Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.

answered Nov 13 '18 at 14:32

Lundin

107k17158262

Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.

answered Nov 13 '18 at 14:32

Lundin

107k17158262

answered Nov 13 '18 at 14:32

Lundin

107k17158262

answered Nov 13 '18 at 14:32

Lundin

107k17158262

answered Nov 13 '18 at 14:32

Lundin

107k17158262

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk