How do I match a column entry from one df to a different df; and if they're the same, append another column's...











up vote
0
down vote

favorite












CONTEXT:



I have two dataframes that have the following set up:



df1 looks like this...and goes on for about 3500 rows:



| id1 | id2   | 
|:----|------:|
| a | name1 |
| b | name2 |
| c | name3 |
| d | name4 |
| e | name5 |
| f | name6 |


df2 looks like this...and goes on for about 4000 rows and about 8 columns



| id1 | ranktrial1   | ranktrial2   | ...
|:----|-------------:|-------------:| ...
| a | rank1 |rank1 | ...
| b | rank2 |rank2 | ...
| c | rank3 |rank3 | ...
| d | rank4 |rank4 | ...
| e | rank5 |rank5 | ...
| f | rank6 |rank6 | ...


NOTE1: some of the id1s, do not have id2s. Meaning they'll be NaNs when they're mapped; and I'll just drop them whenever I get to that step. I don't know if this is relevant, but I just wanted to add it in case it was.



QUESTION:



I need to append/join/place (don't know correct jargon here) the corresponding id2 names to the second dataframe, iff the id1 entry == id1 entry of df2. How do I do this?



The desired dataframe would look like this:



| id1 | id2   | ranktrial1   | tranktrail2  | ...
|:----|------:|-------------:|-------------:| ...
| a | name1 | rank1 | rank1 | ...
| b | name2 | rank2 | rank2 | ...
| c | name3 | rank3 | rank3 | ...
| d | name4 | rank4 | rank4 | ...
| e | name5 | rank5 | rank5 | ...
| f | name6 | rank6 | rank6 | ...


I feel as if this is probably really simple and I'm being a bit of a doofus, as I am a novice Pythoner. However, I have not been able to use similar question's responses to achieve my goal. It is quite likely my fault though :p



Thanks in advance for your help!



edits changed 4000 entries --> 4000 rows. LIkewise for 3500 entries










share|improve this question




















  • 1




    Did you read about merge?
    – Vaishali
    Nov 7 at 18:31










  • Yes! And I have tried numerous methods that I thought would give results that I had hoped. Either, i'd create an empty dataframe; or I'd make a dataframe that did more of a concatenation. Thanks for the response!
    – cross12tamu
    Nov 7 at 18:40















up vote
0
down vote

favorite












CONTEXT:



I have two dataframes that have the following set up:



df1 looks like this...and goes on for about 3500 rows:



| id1 | id2   | 
|:----|------:|
| a | name1 |
| b | name2 |
| c | name3 |
| d | name4 |
| e | name5 |
| f | name6 |


df2 looks like this...and goes on for about 4000 rows and about 8 columns



| id1 | ranktrial1   | ranktrial2   | ...
|:----|-------------:|-------------:| ...
| a | rank1 |rank1 | ...
| b | rank2 |rank2 | ...
| c | rank3 |rank3 | ...
| d | rank4 |rank4 | ...
| e | rank5 |rank5 | ...
| f | rank6 |rank6 | ...


NOTE1: some of the id1s, do not have id2s. Meaning they'll be NaNs when they're mapped; and I'll just drop them whenever I get to that step. I don't know if this is relevant, but I just wanted to add it in case it was.



QUESTION:



I need to append/join/place (don't know correct jargon here) the corresponding id2 names to the second dataframe, iff the id1 entry == id1 entry of df2. How do I do this?



The desired dataframe would look like this:



| id1 | id2   | ranktrial1   | tranktrail2  | ...
|:----|------:|-------------:|-------------:| ...
| a | name1 | rank1 | rank1 | ...
| b | name2 | rank2 | rank2 | ...
| c | name3 | rank3 | rank3 | ...
| d | name4 | rank4 | rank4 | ...
| e | name5 | rank5 | rank5 | ...
| f | name6 | rank6 | rank6 | ...


I feel as if this is probably really simple and I'm being a bit of a doofus, as I am a novice Pythoner. However, I have not been able to use similar question's responses to achieve my goal. It is quite likely my fault though :p



Thanks in advance for your help!



edits changed 4000 entries --> 4000 rows. LIkewise for 3500 entries










share|improve this question




















  • 1




    Did you read about merge?
    – Vaishali
    Nov 7 at 18:31










  • Yes! And I have tried numerous methods that I thought would give results that I had hoped. Either, i'd create an empty dataframe; or I'd make a dataframe that did more of a concatenation. Thanks for the response!
    – cross12tamu
    Nov 7 at 18:40













up vote
0
down vote

favorite









up vote
0
down vote

favorite











CONTEXT:



I have two dataframes that have the following set up:



df1 looks like this...and goes on for about 3500 rows:



| id1 | id2   | 
|:----|------:|
| a | name1 |
| b | name2 |
| c | name3 |
| d | name4 |
| e | name5 |
| f | name6 |


df2 looks like this...and goes on for about 4000 rows and about 8 columns



| id1 | ranktrial1   | ranktrial2   | ...
|:----|-------------:|-------------:| ...
| a | rank1 |rank1 | ...
| b | rank2 |rank2 | ...
| c | rank3 |rank3 | ...
| d | rank4 |rank4 | ...
| e | rank5 |rank5 | ...
| f | rank6 |rank6 | ...


NOTE1: some of the id1s, do not have id2s. Meaning they'll be NaNs when they're mapped; and I'll just drop them whenever I get to that step. I don't know if this is relevant, but I just wanted to add it in case it was.



QUESTION:



I need to append/join/place (don't know correct jargon here) the corresponding id2 names to the second dataframe, iff the id1 entry == id1 entry of df2. How do I do this?



The desired dataframe would look like this:



| id1 | id2   | ranktrial1   | tranktrail2  | ...
|:----|------:|-------------:|-------------:| ...
| a | name1 | rank1 | rank1 | ...
| b | name2 | rank2 | rank2 | ...
| c | name3 | rank3 | rank3 | ...
| d | name4 | rank4 | rank4 | ...
| e | name5 | rank5 | rank5 | ...
| f | name6 | rank6 | rank6 | ...


I feel as if this is probably really simple and I'm being a bit of a doofus, as I am a novice Pythoner. However, I have not been able to use similar question's responses to achieve my goal. It is quite likely my fault though :p



Thanks in advance for your help!



edits changed 4000 entries --> 4000 rows. LIkewise for 3500 entries










share|improve this question















CONTEXT:



I have two dataframes that have the following set up:



df1 looks like this...and goes on for about 3500 rows:



| id1 | id2   | 
|:----|------:|
| a | name1 |
| b | name2 |
| c | name3 |
| d | name4 |
| e | name5 |
| f | name6 |


df2 looks like this...and goes on for about 4000 rows and about 8 columns



| id1 | ranktrial1   | ranktrial2   | ...
|:----|-------------:|-------------:| ...
| a | rank1 |rank1 | ...
| b | rank2 |rank2 | ...
| c | rank3 |rank3 | ...
| d | rank4 |rank4 | ...
| e | rank5 |rank5 | ...
| f | rank6 |rank6 | ...


NOTE1: some of the id1s, do not have id2s. Meaning they'll be NaNs when they're mapped; and I'll just drop them whenever I get to that step. I don't know if this is relevant, but I just wanted to add it in case it was.



QUESTION:



I need to append/join/place (don't know correct jargon here) the corresponding id2 names to the second dataframe, iff the id1 entry == id1 entry of df2. How do I do this?



The desired dataframe would look like this:



| id1 | id2   | ranktrial1   | tranktrail2  | ...
|:----|------:|-------------:|-------------:| ...
| a | name1 | rank1 | rank1 | ...
| b | name2 | rank2 | rank2 | ...
| c | name3 | rank3 | rank3 | ...
| d | name4 | rank4 | rank4 | ...
| e | name5 | rank5 | rank5 | ...
| f | name6 | rank6 | rank6 | ...


I feel as if this is probably really simple and I'm being a bit of a doofus, as I am a novice Pythoner. However, I have not been able to use similar question's responses to achieve my goal. It is quite likely my fault though :p



Thanks in advance for your help!



edits changed 4000 entries --> 4000 rows. LIkewise for 3500 entries







python pandas dataframe merging-data






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 18:46

























asked Nov 7 at 18:29









cross12tamu

54




54








  • 1




    Did you read about merge?
    – Vaishali
    Nov 7 at 18:31










  • Yes! And I have tried numerous methods that I thought would give results that I had hoped. Either, i'd create an empty dataframe; or I'd make a dataframe that did more of a concatenation. Thanks for the response!
    – cross12tamu
    Nov 7 at 18:40














  • 1




    Did you read about merge?
    – Vaishali
    Nov 7 at 18:31










  • Yes! And I have tried numerous methods that I thought would give results that I had hoped. Either, i'd create an empty dataframe; or I'd make a dataframe that did more of a concatenation. Thanks for the response!
    – cross12tamu
    Nov 7 at 18:40








1




1




Did you read about merge?
– Vaishali
Nov 7 at 18:31




Did you read about merge?
– Vaishali
Nov 7 at 18:31












Yes! And I have tried numerous methods that I thought would give results that I had hoped. Either, i'd create an empty dataframe; or I'd make a dataframe that did more of a concatenation. Thanks for the response!
– cross12tamu
Nov 7 at 18:40




Yes! And I have tried numerous methods that I thought would give results that I had hoped. Either, i'd create an empty dataframe; or I'd make a dataframe that did more of a concatenation. Thanks for the response!
– cross12tamu
Nov 7 at 18:40












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










Given you are dropping the missing bits afterwards, this is an inner join and can be accomplished with merge. By default, merge uses all commonly named columns. In this case, the only commonly named column is id1. Also, how='inner' si also the default.



df1.merge(df2)

id1 id2 ranktrial1 tranktrail2
0 a name1 rank1 rank1
1 b name2 rank2 rank2
2 c name3 rank3 rank3
3 d name4 rank4 rank4
4 e name5 rank5 rank5
5 f name6 rank6 rank6


You could be more explicit with



df1.merge(df2, how='inner', on='id1')





share|improve this answer





















  • Thanks for the prompt result! I feel as if I had tried some of the merge (and merge combinations). But let me give it a spin!
    – cross12tamu
    Nov 7 at 18:38










  • Yea, this doesn't work as intended. This creates an empty dataframe that looks something like this (with the indexes like the following:) ` |id1|id2|ranktrial1|ranktrial2|...etc...` and then everything is empty. I don't know why this occurs.
    – cross12tamu
    Nov 7 at 18:44












  • Then that means something else is going on. Likely, you are importing a file and you end up with one column that is a big string per row. You don't know what to expect so assume that it is a dataframe. If I'm right, and this is a file, show us what the file looks like and we'll show you how to parse it. Then this suggestion should work. If I'm wrong... then idk what to do.
    – piRSquared
    Nov 7 at 18:50










  • The first file with the id1/id2 columns is a RData object that I brought in and converted with rpy2 to a pandas dataframe. After I converted it, I cut out some unecessary data so it would only have the id1 and id2 columns. The other file was a tab delimited .txt file, that I brought in with pd.read_csv and had the sep as 't'
    – cross12tamu
    Nov 7 at 18:58










  • gyazo.com/7a879545491365af5ef80864750fde70 gyazo.com/544867157d206909c9c697f3cf36b073 Screenshots from the Spyder IDE, showing the two dataframes and what they look like (obviously, not with the generic names I have used in this example)
    – cross12tamu
    Nov 7 at 19:02











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195620%2fhow-do-i-match-a-column-entry-from-one-df-to-a-different-df-and-if-theyre-the%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










Given you are dropping the missing bits afterwards, this is an inner join and can be accomplished with merge. By default, merge uses all commonly named columns. In this case, the only commonly named column is id1. Also, how='inner' si also the default.



df1.merge(df2)

id1 id2 ranktrial1 tranktrail2
0 a name1 rank1 rank1
1 b name2 rank2 rank2
2 c name3 rank3 rank3
3 d name4 rank4 rank4
4 e name5 rank5 rank5
5 f name6 rank6 rank6


You could be more explicit with



df1.merge(df2, how='inner', on='id1')





share|improve this answer





















  • Thanks for the prompt result! I feel as if I had tried some of the merge (and merge combinations). But let me give it a spin!
    – cross12tamu
    Nov 7 at 18:38










  • Yea, this doesn't work as intended. This creates an empty dataframe that looks something like this (with the indexes like the following:) ` |id1|id2|ranktrial1|ranktrial2|...etc...` and then everything is empty. I don't know why this occurs.
    – cross12tamu
    Nov 7 at 18:44












  • Then that means something else is going on. Likely, you are importing a file and you end up with one column that is a big string per row. You don't know what to expect so assume that it is a dataframe. If I'm right, and this is a file, show us what the file looks like and we'll show you how to parse it. Then this suggestion should work. If I'm wrong... then idk what to do.
    – piRSquared
    Nov 7 at 18:50










  • The first file with the id1/id2 columns is a RData object that I brought in and converted with rpy2 to a pandas dataframe. After I converted it, I cut out some unecessary data so it would only have the id1 and id2 columns. The other file was a tab delimited .txt file, that I brought in with pd.read_csv and had the sep as 't'
    – cross12tamu
    Nov 7 at 18:58










  • gyazo.com/7a879545491365af5ef80864750fde70 gyazo.com/544867157d206909c9c697f3cf36b073 Screenshots from the Spyder IDE, showing the two dataframes and what they look like (obviously, not with the generic names I have used in this example)
    – cross12tamu
    Nov 7 at 19:02















up vote
1
down vote



accepted










Given you are dropping the missing bits afterwards, this is an inner join and can be accomplished with merge. By default, merge uses all commonly named columns. In this case, the only commonly named column is id1. Also, how='inner' si also the default.



df1.merge(df2)

id1 id2 ranktrial1 tranktrail2
0 a name1 rank1 rank1
1 b name2 rank2 rank2
2 c name3 rank3 rank3
3 d name4 rank4 rank4
4 e name5 rank5 rank5
5 f name6 rank6 rank6


You could be more explicit with



df1.merge(df2, how='inner', on='id1')





share|improve this answer





















  • Thanks for the prompt result! I feel as if I had tried some of the merge (and merge combinations). But let me give it a spin!
    – cross12tamu
    Nov 7 at 18:38










  • Yea, this doesn't work as intended. This creates an empty dataframe that looks something like this (with the indexes like the following:) ` |id1|id2|ranktrial1|ranktrial2|...etc...` and then everything is empty. I don't know why this occurs.
    – cross12tamu
    Nov 7 at 18:44












  • Then that means something else is going on. Likely, you are importing a file and you end up with one column that is a big string per row. You don't know what to expect so assume that it is a dataframe. If I'm right, and this is a file, show us what the file looks like and we'll show you how to parse it. Then this suggestion should work. If I'm wrong... then idk what to do.
    – piRSquared
    Nov 7 at 18:50










  • The first file with the id1/id2 columns is a RData object that I brought in and converted with rpy2 to a pandas dataframe. After I converted it, I cut out some unecessary data so it would only have the id1 and id2 columns. The other file was a tab delimited .txt file, that I brought in with pd.read_csv and had the sep as 't'
    – cross12tamu
    Nov 7 at 18:58










  • gyazo.com/7a879545491365af5ef80864750fde70 gyazo.com/544867157d206909c9c697f3cf36b073 Screenshots from the Spyder IDE, showing the two dataframes and what they look like (obviously, not with the generic names I have used in this example)
    – cross12tamu
    Nov 7 at 19:02













up vote
1
down vote



accepted







up vote
1
down vote



accepted






Given you are dropping the missing bits afterwards, this is an inner join and can be accomplished with merge. By default, merge uses all commonly named columns. In this case, the only commonly named column is id1. Also, how='inner' si also the default.



df1.merge(df2)

id1 id2 ranktrial1 tranktrail2
0 a name1 rank1 rank1
1 b name2 rank2 rank2
2 c name3 rank3 rank3
3 d name4 rank4 rank4
4 e name5 rank5 rank5
5 f name6 rank6 rank6


You could be more explicit with



df1.merge(df2, how='inner', on='id1')





share|improve this answer












Given you are dropping the missing bits afterwards, this is an inner join and can be accomplished with merge. By default, merge uses all commonly named columns. In this case, the only commonly named column is id1. Also, how='inner' si also the default.



df1.merge(df2)

id1 id2 ranktrial1 tranktrail2
0 a name1 rank1 rank1
1 b name2 rank2 rank2
2 c name3 rank3 rank3
3 d name4 rank4 rank4
4 e name5 rank5 rank5
5 f name6 rank6 rank6


You could be more explicit with



df1.merge(df2, how='inner', on='id1')






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 7 at 18:32









piRSquared

149k21135275




149k21135275












  • Thanks for the prompt result! I feel as if I had tried some of the merge (and merge combinations). But let me give it a spin!
    – cross12tamu
    Nov 7 at 18:38










  • Yea, this doesn't work as intended. This creates an empty dataframe that looks something like this (with the indexes like the following:) ` |id1|id2|ranktrial1|ranktrial2|...etc...` and then everything is empty. I don't know why this occurs.
    – cross12tamu
    Nov 7 at 18:44












  • Then that means something else is going on. Likely, you are importing a file and you end up with one column that is a big string per row. You don't know what to expect so assume that it is a dataframe. If I'm right, and this is a file, show us what the file looks like and we'll show you how to parse it. Then this suggestion should work. If I'm wrong... then idk what to do.
    – piRSquared
    Nov 7 at 18:50










  • The first file with the id1/id2 columns is a RData object that I brought in and converted with rpy2 to a pandas dataframe. After I converted it, I cut out some unecessary data so it would only have the id1 and id2 columns. The other file was a tab delimited .txt file, that I brought in with pd.read_csv and had the sep as 't'
    – cross12tamu
    Nov 7 at 18:58










  • gyazo.com/7a879545491365af5ef80864750fde70 gyazo.com/544867157d206909c9c697f3cf36b073 Screenshots from the Spyder IDE, showing the two dataframes and what they look like (obviously, not with the generic names I have used in this example)
    – cross12tamu
    Nov 7 at 19:02


















  • Thanks for the prompt result! I feel as if I had tried some of the merge (and merge combinations). But let me give it a spin!
    – cross12tamu
    Nov 7 at 18:38










  • Yea, this doesn't work as intended. This creates an empty dataframe that looks something like this (with the indexes like the following:) ` |id1|id2|ranktrial1|ranktrial2|...etc...` and then everything is empty. I don't know why this occurs.
    – cross12tamu
    Nov 7 at 18:44












  • Then that means something else is going on. Likely, you are importing a file and you end up with one column that is a big string per row. You don't know what to expect so assume that it is a dataframe. If I'm right, and this is a file, show us what the file looks like and we'll show you how to parse it. Then this suggestion should work. If I'm wrong... then idk what to do.
    – piRSquared
    Nov 7 at 18:50










  • The first file with the id1/id2 columns is a RData object that I brought in and converted with rpy2 to a pandas dataframe. After I converted it, I cut out some unecessary data so it would only have the id1 and id2 columns. The other file was a tab delimited .txt file, that I brought in with pd.read_csv and had the sep as 't'
    – cross12tamu
    Nov 7 at 18:58










  • gyazo.com/7a879545491365af5ef80864750fde70 gyazo.com/544867157d206909c9c697f3cf36b073 Screenshots from the Spyder IDE, showing the two dataframes and what they look like (obviously, not with the generic names I have used in this example)
    – cross12tamu
    Nov 7 at 19:02
















Thanks for the prompt result! I feel as if I had tried some of the merge (and merge combinations). But let me give it a spin!
– cross12tamu
Nov 7 at 18:38




Thanks for the prompt result! I feel as if I had tried some of the merge (and merge combinations). But let me give it a spin!
– cross12tamu
Nov 7 at 18:38












Yea, this doesn't work as intended. This creates an empty dataframe that looks something like this (with the indexes like the following:) ` |id1|id2|ranktrial1|ranktrial2|...etc...` and then everything is empty. I don't know why this occurs.
– cross12tamu
Nov 7 at 18:44






Yea, this doesn't work as intended. This creates an empty dataframe that looks something like this (with the indexes like the following:) ` |id1|id2|ranktrial1|ranktrial2|...etc...` and then everything is empty. I don't know why this occurs.
– cross12tamu
Nov 7 at 18:44














Then that means something else is going on. Likely, you are importing a file and you end up with one column that is a big string per row. You don't know what to expect so assume that it is a dataframe. If I'm right, and this is a file, show us what the file looks like and we'll show you how to parse it. Then this suggestion should work. If I'm wrong... then idk what to do.
– piRSquared
Nov 7 at 18:50




Then that means something else is going on. Likely, you are importing a file and you end up with one column that is a big string per row. You don't know what to expect so assume that it is a dataframe. If I'm right, and this is a file, show us what the file looks like and we'll show you how to parse it. Then this suggestion should work. If I'm wrong... then idk what to do.
– piRSquared
Nov 7 at 18:50












The first file with the id1/id2 columns is a RData object that I brought in and converted with rpy2 to a pandas dataframe. After I converted it, I cut out some unecessary data so it would only have the id1 and id2 columns. The other file was a tab delimited .txt file, that I brought in with pd.read_csv and had the sep as 't'
– cross12tamu
Nov 7 at 18:58




The first file with the id1/id2 columns is a RData object that I brought in and converted with rpy2 to a pandas dataframe. After I converted it, I cut out some unecessary data so it would only have the id1 and id2 columns. The other file was a tab delimited .txt file, that I brought in with pd.read_csv and had the sep as 't'
– cross12tamu
Nov 7 at 18:58












gyazo.com/7a879545491365af5ef80864750fde70 gyazo.com/544867157d206909c9c697f3cf36b073 Screenshots from the Spyder IDE, showing the two dataframes and what they look like (obviously, not with the generic names I have used in this example)
– cross12tamu
Nov 7 at 19:02




gyazo.com/7a879545491365af5ef80864750fde70 gyazo.com/544867157d206909c9c697f3cf36b073 Screenshots from the Spyder IDE, showing the two dataframes and what they look like (obviously, not with the generic names I have used in this example)
– cross12tamu
Nov 7 at 19:02


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195620%2fhow-do-i-match-a-column-entry-from-one-df-to-a-different-df-and-if-theyre-the%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Hercules Kyvelos

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud