count number of duplicates for each row of a 2D numpy array
up vote
1
down vote
favorite
Is there a simple way in python to check the amount of duplicates in different rows. For example:
Row1: 12 13 20 25 45 46
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45
I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]
Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.
But do you know any already existing function in python?
python pandas numpy dataframe
add a comment |
up vote
1
down vote
favorite
Is there a simple way in python to check the amount of duplicates in different rows. For example:
Row1: 12 13 20 25 45 46
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45
I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]
Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.
But do you know any already existing function in python?
python pandas numpy dataframe
I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26
True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Is there a simple way in python to check the amount of duplicates in different rows. For example:
Row1: 12 13 20 25 45 46
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45
I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]
Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.
But do you know any already existing function in python?
python pandas numpy dataframe
Is there a simple way in python to check the amount of duplicates in different rows. For example:
Row1: 12 13 20 25 45 46
Row2: 14 24 30 38 39 47
Row3: 1 9 15 21 29 39
Row4: 2 6 14 19 26 45
Row5: 5 23 25 27 32 40
Row6: 6 8 25 26 27 45
I want to compare the Row6 to previous "n" rows.
If n=5, then the output should be something like this: [2 0 0 3 2]
Of course, I can compare each value in Row6 to each value from other row in the loop, and increase the counter for each row.
But do you know any already existing function in python?
python pandas numpy dataframe
python pandas numpy dataframe
edited Nov 7 at 12:32
coldspeed
110k17100169
110k17100169
asked Nov 7 at 11:13
Oleksandr Puchko
83
83
I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26
True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48
add a comment |
I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26
True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48
I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26
I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26
True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48
True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
If you're working with numpy arrays, use broadcasted comparison,
>>> n = 5
>>> v = df.values
>>> v
array([[12, 13, 20, 25, 45, 46],
[14, 24, 30, 38, 39, 47],
[ 1, 9, 15, 21, 29, 39],
[ 2, 6, 14, 19, 26, 45],
[ 5, 23, 25, 27, 32, 40],
[ 6, 8, 25, 26, 27, 45]])
>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
array([2, 0, 0, 3, 2])
Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50
add a comment |
up vote
0
down vote
You could use unique from numpy
>>> import numpy as np
>>> np.unique([1, 1, 2, 2, 3, 3])
array([1, 2, 3])
>>> a = np.array([[1, 1], [2, 3]])
>>> np.unique(a)
array([1, 2, 3])
You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
If you're working with numpy arrays, use broadcasted comparison,
>>> n = 5
>>> v = df.values
>>> v
array([[12, 13, 20, 25, 45, 46],
[14, 24, 30, 38, 39, 47],
[ 1, 9, 15, 21, 29, 39],
[ 2, 6, 14, 19, 26, 45],
[ 5, 23, 25, 27, 32, 40],
[ 6, 8, 25, 26, 27, 45]])
>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
array([2, 0, 0, 3, 2])
Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50
add a comment |
up vote
1
down vote
accepted
If you're working with numpy arrays, use broadcasted comparison,
>>> n = 5
>>> v = df.values
>>> v
array([[12, 13, 20, 25, 45, 46],
[14, 24, 30, 38, 39, 47],
[ 1, 9, 15, 21, 29, 39],
[ 2, 6, 14, 19, 26, 45],
[ 5, 23, 25, 27, 32, 40],
[ 6, 8, 25, 26, 27, 45]])
>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
array([2, 0, 0, 3, 2])
Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
If you're working with numpy arrays, use broadcasted comparison,
>>> n = 5
>>> v = df.values
>>> v
array([[12, 13, 20, 25, 45, 46],
[14, 24, 30, 38, 39, 47],
[ 1, 9, 15, 21, 29, 39],
[ 2, 6, 14, 19, 26, 45],
[ 5, 23, 25, 27, 32, 40],
[ 6, 8, 25, 26, 27, 45]])
>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
array([2, 0, 0, 3, 2])
If you're working with numpy arrays, use broadcasted comparison,
>>> n = 5
>>> v = df.values
>>> v
array([[12, 13, 20, 25, 45, 46],
[14, 24, 30, 38, 39, 47],
[ 1, 9, 15, 21, 29, 39],
[ 2, 6, 14, 19, 26, 45],
[ 5, 23, 25, 27, 32, 40],
[ 6, 8, 25, 26, 27, 45]])
>>> (v[None, -(n+1):-1, None] == v[-1, :, None]).sum(-1).sum(-1).squeeze()
array([2, 0, 0, 3, 2])
answered Nov 7 at 11:29
coldspeed
110k17100169
110k17100169
Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50
add a comment |
Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50
Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50
Thanks a lot, it works, I didn't fully understand how, but still it looks much the better compare to my code.
– Oleksandr Puchko
Nov 7 at 11:50
add a comment |
up vote
0
down vote
You could use unique from numpy
>>> import numpy as np
>>> np.unique([1, 1, 2, 2, 3, 3])
array([1, 2, 3])
>>> a = np.array([[1, 1], [2, 3]])
>>> np.unique(a)
array([1, 2, 3])
You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.
add a comment |
up vote
0
down vote
You could use unique from numpy
>>> import numpy as np
>>> np.unique([1, 1, 2, 2, 3, 3])
array([1, 2, 3])
>>> a = np.array([[1, 1], [2, 3]])
>>> np.unique(a)
array([1, 2, 3])
You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.
add a comment |
up vote
0
down vote
up vote
0
down vote
You could use unique from numpy
>>> import numpy as np
>>> np.unique([1, 1, 2, 2, 3, 3])
array([1, 2, 3])
>>> a = np.array([[1, 1], [2, 3]])
>>> np.unique(a)
array([1, 2, 3])
You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.
You could use unique from numpy
>>> import numpy as np
>>> np.unique([1, 1, 2, 2, 3, 3])
array([1, 2, 3])
>>> a = np.array([[1, 1], [2, 3]])
>>> np.unique(a)
array([1, 2, 3])
You would still need to loop over the n rows and then checking the length of the resulting array.
Maybe you find something more suitable still using numpy.
answered Nov 7 at 11:22
mk18
1,63211029
1,63211029
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53188362%2fcount-number-of-duplicates-for-each-row-of-a-2d-numpy-array%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I think it should be [2, 0, 0, 3, 2]
– coldspeed
Nov 7 at 11:26
True, sorry for typo
– Oleksandr Puchko
Nov 7 at 11:48