Suggestions on a data structure












4















I've been struggling with finding a suitable data structure that meets these requirements:




  1. The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

  2. The number of insertions are not frequent.

  3. Searches are by far more frequent.

  4. Retrieving the index of the element within the data structure is critical.

  5. Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.


With this in mind, I have these considerations:




  • A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

  • A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.


Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.



A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.



What do you think? Any suggestion would be more than appreciated.



Many thanks










share|improve this question























  • I forgot to add that the insertions can be at the end, middle or beginning.

    – mgfernan
    Nov 13 '18 at 13:55











  • One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

    – Jean-Marc Zimmer
    Nov 13 '18 at 13:59











  • You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

    – joop
    Nov 13 '18 at 14:25











  • Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

    – Dukeling
    Nov 13 '18 at 15:42













  • @Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

    – mgfernan
    Nov 14 '18 at 6:28
















4















I've been struggling with finding a suitable data structure that meets these requirements:




  1. The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

  2. The number of insertions are not frequent.

  3. Searches are by far more frequent.

  4. Retrieving the index of the element within the data structure is critical.

  5. Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.


With this in mind, I have these considerations:




  • A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

  • A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.


Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.



A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.



What do you think? Any suggestion would be more than appreciated.



Many thanks










share|improve this question























  • I forgot to add that the insertions can be at the end, middle or beginning.

    – mgfernan
    Nov 13 '18 at 13:55











  • One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

    – Jean-Marc Zimmer
    Nov 13 '18 at 13:59











  • You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

    – joop
    Nov 13 '18 at 14:25











  • Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

    – Dukeling
    Nov 13 '18 at 15:42













  • @Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

    – mgfernan
    Nov 14 '18 at 6:28














4












4








4


2






I've been struggling with finding a suitable data structure that meets these requirements:




  1. The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

  2. The number of insertions are not frequent.

  3. Searches are by far more frequent.

  4. Retrieving the index of the element within the data structure is critical.

  5. Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.


With this in mind, I have these considerations:




  • A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

  • A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.


Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.



A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.



What do you think? Any suggestion would be more than appreciated.



Many thanks










share|improve this question














I've been struggling with finding a suitable data structure that meets these requirements:




  1. The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.

  2. The number of insertions are not frequent.

  3. Searches are by far more frequent.

  4. Retrieving the index of the element within the data structure is critical.

  5. Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.


With this in mind, I have these considerations:




  • A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.

  • A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.


Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.



A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.



What do you think? Any suggestion would be more than appreciated.



Many thanks







c data-structures






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 13:54









mgfernanmgfernan

4761617




4761617













  • I forgot to add that the insertions can be at the end, middle or beginning.

    – mgfernan
    Nov 13 '18 at 13:55











  • One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

    – Jean-Marc Zimmer
    Nov 13 '18 at 13:59











  • You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

    – joop
    Nov 13 '18 at 14:25











  • Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

    – Dukeling
    Nov 13 '18 at 15:42













  • @Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

    – mgfernan
    Nov 14 '18 at 6:28



















  • I forgot to add that the insertions can be at the end, middle or beginning.

    – mgfernan
    Nov 13 '18 at 13:55











  • One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

    – Jean-Marc Zimmer
    Nov 13 '18 at 13:59











  • You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

    – joop
    Nov 13 '18 at 14:25











  • Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

    – Dukeling
    Nov 13 '18 at 15:42













  • @Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

    – mgfernan
    Nov 14 '18 at 6:28

















I forgot to add that the insertions can be at the end, middle or beginning.

– mgfernan
Nov 13 '18 at 13:55





I forgot to add that the insertions can be at the end, middle or beginning.

– mgfernan
Nov 13 '18 at 13:55













One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

– Jean-Marc Zimmer
Nov 13 '18 at 13:59





One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list)

– Jean-Marc Zimmer
Nov 13 '18 at 13:59













You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

– joop
Nov 13 '18 at 14:25





You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index.

– joop
Nov 13 '18 at 14:25













Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

– Dukeling
Nov 13 '18 at 15:42







Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there?

– Dukeling
Nov 13 '18 at 15:42















@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

– mgfernan
Nov 14 '18 at 6:28





@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5).

– mgfernan
Nov 14 '18 at 6:28












3 Answers
3






active

oldest

votes


















2














An array would probably be the best data structure in this case.



Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.



Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.



So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.






share|improve this answer































    0














    What about a search tree? It's a good fit for all of your requirements except 4.



    To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.



    Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.






    share|improve this answer































      0














      Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.



      This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.



      Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.






      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53282597%2fsuggestions-on-a-data-structure%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        2














        An array would probably be the best data structure in this case.



        Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.



        Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.



        So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.






        share|improve this answer




























          2














          An array would probably be the best data structure in this case.



          Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.



          Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.



          So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.






          share|improve this answer


























            2












            2








            2







            An array would probably be the best data structure in this case.



            Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.



            Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.



            So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.






            share|improve this answer













            An array would probably be the best data structure in this case.



            Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.



            Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.



            So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 13 '18 at 14:27









            dbushdbush

            93.9k12101134




            93.9k12101134

























                0














                What about a search tree? It's a good fit for all of your requirements except 4.



                To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.



                Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.






                share|improve this answer




























                  0














                  What about a search tree? It's a good fit for all of your requirements except 4.



                  To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.



                  Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.






                  share|improve this answer


























                    0












                    0








                    0







                    What about a search tree? It's a good fit for all of your requirements except 4.



                    To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.



                    Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.






                    share|improve this answer













                    What about a search tree? It's a good fit for all of your requirements except 4.



                    To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.



                    Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 13 '18 at 14:12









                    kfxkfx

                    5,02821535




                    5,02821535























                        0














                        Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.



                        This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.



                        Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.






                        share|improve this answer




























                          0














                          Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.



                          This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.



                          Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.






                          share|improve this answer


























                            0












                            0








                            0







                            Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.



                            This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.



                            Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.






                            share|improve this answer













                            Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.



                            This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.



                            Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 13 '18 at 14:32









                            LundinLundin

                            107k17158262




                            107k17158262






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53282597%2fsuggestions-on-a-data-structure%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                這個網誌中的熱門文章

                                Tangent Lines Diagram Along Smooth Curve

                                Yusuf al-Mu'taman ibn Hud

                                Zucchini