Joining dataframes with same last letter as condition
up vote
-1
down vote
favorite
I have 2 dataframes that I want joined.
product_no code
12 aj
12 mn
13 aj
p_no cde
12 *j
12 mn
13 *j
Result
product_no code p_no cde
12 aj 12 *j
12 mn 12 mn
13 aj 12 *j
I want to match all codes that end with j
with *j
how do I do this? I know I have to join where product_no === p_no
, but how do I join where if the last letter of a code is j, then join by *j
?
EDIT
We are currently joining by product_no
, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.
The data for the second dataframe only contains 3 values for the code
column: 2 letters, *j, or **
The conditions for the join are as follows:
- If the actual code
mn
for example, exists in the second dataframe, then we join. - If the actual code is not in the second dataframe, then we check if the code in the first ends with
j
, if it does then we join wherecde
equals to*j
- If the actual code does not end with
j
OR if we can't find*j
in the corresponding dataframe, then we join by**
scala apache-spark
add a comment |
up vote
-1
down vote
favorite
I have 2 dataframes that I want joined.
product_no code
12 aj
12 mn
13 aj
p_no cde
12 *j
12 mn
13 *j
Result
product_no code p_no cde
12 aj 12 *j
12 mn 12 mn
13 aj 12 *j
I want to match all codes that end with j
with *j
how do I do this? I know I have to join where product_no === p_no
, but how do I join where if the last letter of a code is j, then join by *j
?
EDIT
We are currently joining by product_no
, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.
The data for the second dataframe only contains 3 values for the code
column: 2 letters, *j, or **
The conditions for the join are as follows:
- If the actual code
mn
for example, exists in the second dataframe, then we join. - If the actual code is not in the second dataframe, then we check if the code in the first ends with
j
, if it does then we join wherecde
equals to*j
- If the actual code does not end with
j
OR if we can't find*j
in the corresponding dataframe, then we join by**
scala apache-spark
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I have 2 dataframes that I want joined.
product_no code
12 aj
12 mn
13 aj
p_no cde
12 *j
12 mn
13 *j
Result
product_no code p_no cde
12 aj 12 *j
12 mn 12 mn
13 aj 12 *j
I want to match all codes that end with j
with *j
how do I do this? I know I have to join where product_no === p_no
, but how do I join where if the last letter of a code is j, then join by *j
?
EDIT
We are currently joining by product_no
, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.
The data for the second dataframe only contains 3 values for the code
column: 2 letters, *j, or **
The conditions for the join are as follows:
- If the actual code
mn
for example, exists in the second dataframe, then we join. - If the actual code is not in the second dataframe, then we check if the code in the first ends with
j
, if it does then we join wherecde
equals to*j
- If the actual code does not end with
j
OR if we can't find*j
in the corresponding dataframe, then we join by**
scala apache-spark
I have 2 dataframes that I want joined.
product_no code
12 aj
12 mn
13 aj
p_no cde
12 *j
12 mn
13 *j
Result
product_no code p_no cde
12 aj 12 *j
12 mn 12 mn
13 aj 12 *j
I want to match all codes that end with j
with *j
how do I do this? I know I have to join where product_no === p_no
, but how do I join where if the last letter of a code is j, then join by *j
?
EDIT
We are currently joining by product_no
, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.
The data for the second dataframe only contains 3 values for the code
column: 2 letters, *j, or **
The conditions for the join are as follows:
- If the actual code
mn
for example, exists in the second dataframe, then we join. - If the actual code is not in the second dataframe, then we check if the code in the first ends with
j
, if it does then we join wherecde
equals to*j
- If the actual code does not end with
j
OR if we can't find*j
in the corresponding dataframe, then we join by**
scala apache-spark
scala apache-spark
edited Nov 8 at 15:48
asked Nov 8 at 4:52
user2896120
9301021
9301021
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring
function as follows:
df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))
Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code,*j
or**
. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with*j
If*j
, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with**
just like the result. Please let me know if this makes sense
– user2896120
Nov 8 at 7:10
Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join onproduct_no
in all the cases as well.Whetherj
is a constant and only forj
you want the functionalityj
=*j
or it's just an example and it should work for any character. Also what happens if you have a match with*j
and also with**
, both should be taken ? etc.
– Grisha Weintraub
Nov 8 at 7:28
only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
– user2896120
Nov 8 at 15:49
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring
function as follows:
df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))
Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code,*j
or**
. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with*j
If*j
, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with**
just like the result. Please let me know if this makes sense
– user2896120
Nov 8 at 7:10
Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join onproduct_no
in all the cases as well.Whetherj
is a constant and only forj
you want the functionalityj
=*j
or it's just an example and it should work for any character. Also what happens if you have a match with*j
and also with**
, both should be taken ? etc.
– Grisha Weintraub
Nov 8 at 7:28
only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
– user2896120
Nov 8 at 15:49
add a comment |
up vote
0
down vote
It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring
function as follows:
df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))
Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code,*j
or**
. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with*j
If*j
, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with**
just like the result. Please let me know if this makes sense
– user2896120
Nov 8 at 7:10
Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join onproduct_no
in all the cases as well.Whetherj
is a constant and only forj
you want the functionalityj
=*j
or it's just an example and it should work for any character. Also what happens if you have a match with*j
and also with**
, both should be taken ? etc.
– Grisha Weintraub
Nov 8 at 7:28
only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
– user2896120
Nov 8 at 15:49
add a comment |
up vote
0
down vote
up vote
0
down vote
It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring
function as follows:
df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))
It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring
function as follows:
df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))
answered Nov 8 at 6:49
Grisha Weintraub
6,37311438
6,37311438
Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code,*j
or**
. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with*j
If*j
, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with**
just like the result. Please let me know if this makes sense
– user2896120
Nov 8 at 7:10
Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join onproduct_no
in all the cases as well.Whetherj
is a constant and only forj
you want the functionalityj
=*j
or it's just an example and it should work for any character. Also what happens if you have a match with*j
and also with**
, both should be taken ? etc.
– Grisha Weintraub
Nov 8 at 7:28
only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
– user2896120
Nov 8 at 15:49
add a comment |
Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code,*j
or**
. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with*j
If*j
, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with**
just like the result. Please let me know if this makes sense
– user2896120
Nov 8 at 7:10
Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join onproduct_no
in all the cases as well.Whetherj
is a constant and only forj
you want the functionalityj
=*j
or it's just an example and it should work for any character. Also what happens if you have a match with*j
and also with**
, both should be taken ? etc.
– Grisha Weintraub
Nov 8 at 7:28
only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
– user2896120
Nov 8 at 15:49
Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code,
*j
or **
. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j
If *j
, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with **
just like the result. Please let me know if this makes sense– user2896120
Nov 8 at 7:10
Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code,
*j
or **
. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j
If *j
, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with **
just like the result. Please let me know if this makes sense– user2896120
Nov 8 at 7:10
Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on
product_no
in all the cases as well.Whether j
is a constant and only for j
you want the functionality j
= *j
or it's just an example and it should work for any character. Also what happens if you have a match with *j
and also with **
, both should be taken ? etc.– Grisha Weintraub
Nov 8 at 7:28
Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on
product_no
in all the cases as well.Whether j
is a constant and only for j
you want the functionality j
= *j
or it's just an example and it should work for any character. Also what happens if you have a match with *j
and also with **
, both should be taken ? etc.– Grisha Weintraub
Nov 8 at 7:28
only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
– user2896120
Nov 8 at 15:49
only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
– user2896120
Nov 8 at 15:49
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53201741%2fjoining-dataframes-with-same-last-letter-as-condition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown