count number of duplicates for each row of a 2D numpy array











up vote
1
down vote

favorite












Is there a simple way in python to check the amount of duplicates in different rows. For example:



Row1: 12  13  20  25  45  46  
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45


I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]



Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.



But do you know any already existing function in python?










share|improve this question
























  • I think it should be [2, 0, 0, 3, 2]
    – coldspeed
    Nov 7 at 11:26










  • True, sorry for typo
    – Oleksandr Puchko
    Nov 7 at 11:48















up vote
1
down vote

favorite












Is there a simple way in python to check the amount of duplicates in different rows. For example:



Row1: 12  13  20  25  45  46  
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45


I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]



Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.



But do you know any already existing function in python?










share|improve this question
























  • I think it should be [2, 0, 0, 3, 2]
    – coldspeed
    Nov 7 at 11:26










  • True, sorry for typo
    – Oleksandr Puchko
    Nov 7 at 11:48













up vote
1
down vote

favorite









up vote
1
down vote

favorite











Is there a simple way in python to check the amount of duplicates in different rows. For example:



Row1: 12  13  20  25  45  46  
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45


I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]



Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.



But do you know any already existing function in python?










share|improve this question















Is there a simple way in python to check the amount of duplicates in different rows. For example:



Row1: 12  13  20  25  45  46  
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45


I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]



Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.



But do you know any already existing function in python?







python pandas numpy dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 12:32









coldspeed

110k17100169




110k17100169










asked Nov 7 at 11:13









Oleksandr Puchko

83




83












  • I think it should be [2, 0, 0, 3, 2]
    – coldspeed
    Nov 7 at 11:26










  • True, sorry for typo
    – Oleksandr Puchko
    Nov 7 at 11:48


















  • I think it should be [2, 0, 0, 3, 2]
    – coldspeed
    Nov 7 at 11:26










  • True, sorry for typo
    – Oleksandr Puchko
    Nov 7 at 11:48
















I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26




I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26












True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48




True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48












2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










If you're working with numpy arrays, use broadcasted comparison,



>>> n = 5
>>> v = df.values
>>> v
array([[12, 13, 20, 25, 45, 46],
[14, 24, 30, 38, 39, 47],
[ 1, 9, 15, 21, 29, 39],
[ 2, 6, 14, 19, 26, 45],
[ 5, 23, 25, 27, 32, 40],
[ 6, 8, 25, 26, 27, 45]])
>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
array([2, 0, 0, 3, 2])





share|improve this answer





















  • Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
    – Oleksandr Puchko
    Nov 7 at 11:50




















up vote
0
down vote













You could use unique from numpy



>>> import numpy as np 
>>> np.unique([1, 1, 2, 2, 3, 3])
array([1, 2, 3])
>>> a = np.array([[1, 1], [2, 3]])
>>> np.unique(a)
array([1, 2, 3])


You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53188362%2fcount-number-of-duplicates-for-each-row-of-a-2d-numpy-array%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    If you're working with numpy arrays, use broadcasted comparison,



    >>> n = 5
    >>> v = df.values
    >>> v
    array([[12, 13, 20, 25, 45, 46],
    [14, 24, 30, 38, 39, 47],
    [ 1, 9, 15, 21, 29, 39],
    [ 2, 6, 14, 19, 26, 45],
    [ 5, 23, 25, 27, 32, 40],
    [ 6, 8, 25, 26, 27, 45]])
    >>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
    array([2, 0, 0, 3, 2])





    share|improve this answer





















    • Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
      – Oleksandr Puchko
      Nov 7 at 11:50

















    up vote
    1
    down vote



    accepted










    If you're working with numpy arrays, use broadcasted comparison,



    >>> n = 5
    >>> v = df.values
    >>> v
    array([[12, 13, 20, 25, 45, 46],
    [14, 24, 30, 38, 39, 47],
    [ 1, 9, 15, 21, 29, 39],
    [ 2, 6, 14, 19, 26, 45],
    [ 5, 23, 25, 27, 32, 40],
    [ 6, 8, 25, 26, 27, 45]])
    >>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
    array([2, 0, 0, 3, 2])





    share|improve this answer





















    • Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
      – Oleksandr Puchko
      Nov 7 at 11:50















    up vote
    1
    down vote



    accepted







    up vote
    1
    down vote



    accepted






    If you're working with numpy arrays, use broadcasted comparison,



    >>> n = 5
    >>> v = df.values
    >>> v
    array([[12, 13, 20, 25, 45, 46],
    [14, 24, 30, 38, 39, 47],
    [ 1, 9, 15, 21, 29, 39],
    [ 2, 6, 14, 19, 26, 45],
    [ 5, 23, 25, 27, 32, 40],
    [ 6, 8, 25, 26, 27, 45]])
    >>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
    array([2, 0, 0, 3, 2])





    share|improve this answer












    If you're working with numpy arrays, use broadcasted comparison,



    >>> n = 5
    >>> v = df.values
    >>> v
    array([[12, 13, 20, 25, 45, 46],
    [14, 24, 30, 38, 39, 47],
    [ 1, 9, 15, 21, 29, 39],
    [ 2, 6, 14, 19, 26, 45],
    [ 5, 23, 25, 27, 32, 40],
    [ 6, 8, 25, 26, 27, 45]])
    >>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
    array([2, 0, 0, 3, 2])






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 7 at 11:29









    coldspeed

    110k17100169




    110k17100169












    • Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
      – Oleksandr Puchko
      Nov 7 at 11:50




















    • Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
      – Oleksandr Puchko
      Nov 7 at 11:50


















    Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
    – Oleksandr Puchko
    Nov 7 at 11:50






    Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
    – Oleksandr Puchko
    Nov 7 at 11:50














    up vote
    0
    down vote













    You could use unique from numpy



    >>> import numpy as np 
    >>> np.unique([1, 1, 2, 2, 3, 3])
    array([1, 2, 3])
    >>> a = np.array([[1, 1], [2, 3]])
    >>> np.unique(a)
    array([1, 2, 3])


    You would still need to loop over the n rows and then checking the length of the resulting array.
    Maybe you find something more suitable still using numpy.






    share|improve this answer

























      up vote
      0
      down vote













      You could use unique from numpy



      >>> import numpy as np 
      >>> np.unique([1, 1, 2, 2, 3, 3])
      array([1, 2, 3])
      >>> a = np.array([[1, 1], [2, 3]])
      >>> np.unique(a)
      array([1, 2, 3])


      You would still need to loop over the n rows and then checking the length of the resulting array.
      Maybe you find something more suitable still using numpy.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        You could use unique from numpy



        >>> import numpy as np 
        >>> np.unique([1, 1, 2, 2, 3, 3])
        array([1, 2, 3])
        >>> a = np.array([[1, 1], [2, 3]])
        >>> np.unique(a)
        array([1, 2, 3])


        You would still need to loop over the n rows and then checking the length of the resulting array.
        Maybe you find something more suitable still using numpy.






        share|improve this answer












        You could use unique from numpy



        >>> import numpy as np 
        >>> np.unique([1, 1, 2, 2, 3, 3])
        array([1, 2, 3])
        >>> a = np.array([[1, 1], [2, 3]])
        >>> np.unique(a)
        array([1, 2, 3])


        You would still need to loop over the n rows and then checking the length of the resulting array.
        Maybe you find something more suitable still using numpy.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 7 at 11:22









        mk18

        1,63211029




        1,63211029






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53188362%2fcount-number-of-duplicates-for-each-row-of-a-2d-numpy-array%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini