count number of duplicates for each row of a 2D numpy array

up vote
1
down vote

favorite

Is there a simple way in python to check the amount of duplicates in different rows. For example:

Row1: 12  13  20  25  45  46  

Row2: 14  24  30  38  39  47  

Row3:  1   9  15  21  29  39  

Row4:  2   6  14  19  26  45  

Row5:  5  23  25  27  32  40  

Row6:  6   8  25  26  27  45

I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]

Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.

But do you know any already existing function in python?

edited Nov 7 at 12:32

coldspeed

110k17100169

asked Nov 7 at 11:13

Oleksandr Puchko

I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26

True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48

add a comment |

up vote
1
down vote

favorite

Is there a simple way in python to check the amount of duplicates in different rows. For example:

Row1: 12  13  20  25  45  46  

Row2: 14  24  30  38  39  47  

Row3:  1   9  15  21  29  39  

Row4:  2   6  14  19  26  45  

Row5:  5  23  25  27  32  40  

Row6:  6   8  25  26  27  45

I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]

Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.

But do you know any already existing function in python?

edited Nov 7 at 12:32

coldspeed

110k17100169

asked Nov 7 at 11:13

Oleksandr Puchko

I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26

True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48

add a comment |

up vote
1
down vote

favorite

Is there a simple way in python to check the amount of duplicates in different rows. For example:

Row1: 12  13  20  25  45  46  

Row2: 14  24  30  38  39  47  

Row3:  1   9  15  21  29  39  

Row4:  2   6  14  19  26  45  

Row5:  5  23  25  27  32  40  

Row6:  6   8  25  26  27  45

I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]

Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.

But do you know any already existing function in python?

edited Nov 7 at 12:32

coldspeed

110k17100169

asked Nov 7 at 11:13

Oleksandr Puchko

Is there a simple way in python to check the amount of duplicates in different rows. For example:

Row1: 12  13  20  25  45  46  

Row2: 14  24  30  38  39  47  

Row3:  1   9  15  21  29  39  

Row4:  2   6  14  19  26  45  

Row5:  5  23  25  27  32  40  

Row6:  6   8  25  26  27  45

I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]

Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.

But do you know any already existing function in python?

python pandas numpy dataframe

edited Nov 7 at 12:32

coldspeed

110k17100169

asked Nov 7 at 11:13

Oleksandr Puchko

edited Nov 7 at 12:32

coldspeed

110k17100169

asked Nov 7 at 11:13

Oleksandr Puchko

edited Nov 7 at 12:32

coldspeed

110k17100169

edited Nov 7 at 12:32

coldspeed

110k17100169

edited Nov 7 at 12:32

coldspeed

110k17100169

asked Nov 7 at 11:13

Oleksandr Puchko

asked Nov 7 at 11:13

Oleksandr Puchko

asked Nov 7 at 11:13

Oleksandr Puchko

I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26

True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48

add a comment |

I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26

True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48

I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26

True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

If you're working with numpy arrays, use broadcasted comparison,

>>> n = 5

>>> v = df.values 

>>> v

array([[12, 13, 20, 25, 45, 46],

       [14, 24, 30, 38, 39, 47],

       [ 1,  9, 15, 21, 29, 39],

       [ 2,  6, 14, 19, 26, 45],

       [ 5, 23, 25, 27, 32, 40],

       [ 6,  8, 25, 26, 27, 45]])

>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()

array([2, 0, 0, 3, 2])

answered Nov 7 at 11:29

coldspeed

110k17100169

Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50

add a comment |

up vote
0
down vote

You could use unique from numpy

>>> import numpy as np 

>>> np.unique([1, 1, 2, 2, 3, 3])

array([1, 2, 3])

>>> a = np.array([[1, 1], [2, 3]])

>>> np.unique(a)

array([1, 2, 3])

You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.

answered Nov 7 at 11:22

mk18

1,63211029

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53188362%2fcount-number-of-duplicates-for-each-row-of-a-2d-numpy-array%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

If you're working with numpy arrays, use broadcasted comparison,

>>> n = 5

>>> v = df.values 

>>> v

array([[12, 13, 20, 25, 45, 46],

       [14, 24, 30, 38, 39, 47],

       [ 1,  9, 15, 21, 29, 39],

       [ 2,  6, 14, 19, 26, 45],

       [ 5, 23, 25, 27, 32, 40],

       [ 6,  8, 25, 26, 27, 45]])

>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()

array([2, 0, 0, 3, 2])

answered Nov 7 at 11:29

coldspeed

110k17100169

Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50

add a comment |

up vote
1
down vote

accepted

If you're working with numpy arrays, use broadcasted comparison,

>>> n = 5

>>> v = df.values 

>>> v

array([[12, 13, 20, 25, 45, 46],

       [14, 24, 30, 38, 39, 47],

       [ 1,  9, 15, 21, 29, 39],

       [ 2,  6, 14, 19, 26, 45],

       [ 5, 23, 25, 27, 32, 40],

       [ 6,  8, 25, 26, 27, 45]])

>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()

array([2, 0, 0, 3, 2])

answered Nov 7 at 11:29

coldspeed

110k17100169

Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50

add a comment |

up vote
1
down vote

accepted

If you're working with numpy arrays, use broadcasted comparison,

>>> n = 5

>>> v = df.values 

>>> v

array([[12, 13, 20, 25, 45, 46],

       [14, 24, 30, 38, 39, 47],

       [ 1,  9, 15, 21, 29, 39],

       [ 2,  6, 14, 19, 26, 45],

       [ 5, 23, 25, 27, 32, 40],

       [ 6,  8, 25, 26, 27, 45]])

>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()

array([2, 0, 0, 3, 2])

answered Nov 7 at 11:29

coldspeed

110k17100169

If you're working with numpy arrays, use broadcasted comparison,

>>> n = 5

>>> v = df.values 

>>> v

array([[12, 13, 20, 25, 45, 46],

       [14, 24, 30, 38, 39, 47],

       [ 1,  9, 15, 21, 29, 39],

       [ 2,  6, 14, 19, 26, 45],

       [ 5, 23, 25, 27, 32, 40],

       [ 6,  8, 25, 26, 27, 45]])

>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()

array([2, 0, 0, 3, 2])

answered Nov 7 at 11:29

coldspeed

110k17100169

answered Nov 7 at 11:29

coldspeed

110k17100169

answered Nov 7 at 11:29

coldspeed

110k17100169

answered Nov 7 at 11:29

coldspeed

110k17100169

Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50

add a comment |

Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50

Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50

add a comment |

up vote
0
down vote

You could use unique from numpy

>>> import numpy as np 

>>> np.unique([1, 1, 2, 2, 3, 3])

array([1, 2, 3])

>>> a = np.array([[1, 1], [2, 3]])

>>> np.unique(a)

array([1, 2, 3])

You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.

answered Nov 7 at 11:22

mk18

1,63211029

add a comment |

up vote
0
down vote

You could use unique from numpy

>>> import numpy as np 

>>> np.unique([1, 1, 2, 2, 3, 3])

array([1, 2, 3])

>>> a = np.array([[1, 1], [2, 3]])

>>> np.unique(a)

array([1, 2, 3])

You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.

answered Nov 7 at 11:22

mk18

1,63211029

add a comment |

up vote
0
down vote

You could use unique from numpy

>>> import numpy as np 

>>> np.unique([1, 1, 2, 2, 3, 3])

array([1, 2, 3])

>>> a = np.array([[1, 1], [2, 3]])

>>> np.unique(a)

array([1, 2, 3])

You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.

answered Nov 7 at 11:22

mk18

1,63211029

You could use unique from numpy

>>> import numpy as np 

>>> np.unique([1, 1, 2, 2, 3, 3])

array([1, 2, 3])

>>> a = np.array([[1, 1], [2, 3]])

>>> np.unique(a)

array([1, 2, 3])

You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.

answered Nov 7 at 11:22

mk18

1,63211029

answered Nov 7 at 11:22

mk18

1,63211029

answered Nov 7 at 11:22

mk18

1,63211029

answered Nov 7 at 11:22

mk18

1,63211029

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

P94TLgymL sKFe3VseN PTV8QiIHYdjUcq8F,r8N H,mGa,Zz nm6b ZWmb H,iKP8 cUr4UBtxs,pjeLDWOZMv74,gEzX,NE,DCrbX0 n6BJy

搜尋此網誌

Wsrtjtyk