Joining dataframes with same last letter as condition











up vote
-1
down vote

favorite












I have 2 dataframes that I want joined.



product_no  code
12 aj
12 mn
13 aj

p_no cde
12 *j
12 mn
13 *j

Result
product_no code p_no cde
12 aj 12 *j
12 mn 12 mn
13 aj 12 *j


I want to match all codes that end with j with *j how do I do this? I know I have to join where product_no === p_no, but how do I join where if the last letter of a code is j, then join by *j?



EDIT



We are currently joining by product_no, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.



The data for the second dataframe only contains 3 values for the code column: 2 letters, *j, or ** The conditions for the join are as follows:




  1. If the actual code mn for example, exists in the second dataframe, then we join.

  2. If the actual code is not in the second dataframe, then we check if the code in the first ends with j, if it does then we join where cde equals to *j

  3. If the actual code does not end with j OR if we can't find *j in the corresponding dataframe, then we join by **










share|improve this question




























    up vote
    -1
    down vote

    favorite












    I have 2 dataframes that I want joined.



    product_no  code
    12 aj
    12 mn
    13 aj

    p_no cde
    12 *j
    12 mn
    13 *j

    Result
    product_no code p_no cde
    12 aj 12 *j
    12 mn 12 mn
    13 aj 12 *j


    I want to match all codes that end with j with *j how do I do this? I know I have to join where product_no === p_no, but how do I join where if the last letter of a code is j, then join by *j?



    EDIT



    We are currently joining by product_no, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.



    The data for the second dataframe only contains 3 values for the code column: 2 letters, *j, or ** The conditions for the join are as follows:




    1. If the actual code mn for example, exists in the second dataframe, then we join.

    2. If the actual code is not in the second dataframe, then we check if the code in the first ends with j, if it does then we join where cde equals to *j

    3. If the actual code does not end with j OR if we can't find *j in the corresponding dataframe, then we join by **










    share|improve this question


























      up vote
      -1
      down vote

      favorite









      up vote
      -1
      down vote

      favorite











      I have 2 dataframes that I want joined.



      product_no  code
      12 aj
      12 mn
      13 aj

      p_no cde
      12 *j
      12 mn
      13 *j

      Result
      product_no code p_no cde
      12 aj 12 *j
      12 mn 12 mn
      13 aj 12 *j


      I want to match all codes that end with j with *j how do I do this? I know I have to join where product_no === p_no, but how do I join where if the last letter of a code is j, then join by *j?



      EDIT



      We are currently joining by product_no, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.



      The data for the second dataframe only contains 3 values for the code column: 2 letters, *j, or ** The conditions for the join are as follows:




      1. If the actual code mn for example, exists in the second dataframe, then we join.

      2. If the actual code is not in the second dataframe, then we check if the code in the first ends with j, if it does then we join where cde equals to *j

      3. If the actual code does not end with j OR if we can't find *j in the corresponding dataframe, then we join by **










      share|improve this question















      I have 2 dataframes that I want joined.



      product_no  code
      12 aj
      12 mn
      13 aj

      p_no cde
      12 *j
      12 mn
      13 *j

      Result
      product_no code p_no cde
      12 aj 12 *j
      12 mn 12 mn
      13 aj 12 *j


      I want to match all codes that end with j with *j how do I do this? I know I have to join where product_no === p_no, but how do I join where if the last letter of a code is j, then join by *j?



      EDIT



      We are currently joining by product_no, and need to join the codes in the first dataframe to the codes in the second dataframe in an appropriate way.



      The data for the second dataframe only contains 3 values for the code column: 2 letters, *j, or ** The conditions for the join are as follows:




      1. If the actual code mn for example, exists in the second dataframe, then we join.

      2. If the actual code is not in the second dataframe, then we check if the code in the first ends with j, if it does then we join where cde equals to *j

      3. If the actual code does not end with j OR if we can't find *j in the corresponding dataframe, then we join by **







      scala apache-spark






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 8 at 15:48

























      asked Nov 8 at 4:52









      user2896120

      9301021




      9301021
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring function as follows:



          df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))





          share|improve this answer





















          • Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code, *j or **. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j If *j, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with ** just like the result. Please let me know if this makes sense
            – user2896120
            Nov 8 at 7:10










          • Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on product_no in all the cases as well.Whether j is a constant and only for j you want the functionality j = *j or it's just an example and it should work for any character. Also what happens if you have a match with *j and also with **, both should be taken ? etc.
            – Grisha Weintraub
            Nov 8 at 7:28












          • only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
            – user2896120
            Nov 8 at 15:49













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53201741%2fjoining-dataframes-with-same-last-letter-as-condition%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring function as follows:



          df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))





          share|improve this answer





















          • Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code, *j or **. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j If *j, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with ** just like the result. Please let me know if this makes sense
            – user2896120
            Nov 8 at 7:10










          • Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on product_no in all the cases as well.Whether j is a constant and only for j you want the functionality j = *j or it's just an example and it should work for any character. Also what happens if you have a match with *j and also with **, both should be taken ? etc.
            – Grisha Weintraub
            Nov 8 at 7:28












          • only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
            – user2896120
            Nov 8 at 15:49

















          up vote
          0
          down vote













          It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring function as follows:



          df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))





          share|improve this answer





















          • Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code, *j or **. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j If *j, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with ** just like the result. Please let me know if this makes sense
            – user2896120
            Nov 8 at 7:10










          • Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on product_no in all the cases as well.Whether j is a constant and only for j you want the functionality j = *j or it's just an example and it should work for any character. Also what happens if you have a match with *j and also with **, both should be taken ? etc.
            – Grisha Weintraub
            Nov 8 at 7:28












          • only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
            – user2896120
            Nov 8 at 15:49















          up vote
          0
          down vote










          up vote
          0
          down vote









          It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring function as follows:



          df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))





          share|improve this answer












          It's not clear what exactly you are trying to do. But if you want to join data frames on the condition [column last character] = *[column last character], you can use substring function as follows:



          df1.join(df2, concat(lit("*"), substring(df1.col("code"),-1,1)) === df2.col("cde"))






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 8 at 6:49









          Grisha Weintraub

          6,37311438




          6,37311438












          • Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code, *j or **. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j If *j, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with ** just like the result. Please let me know if this makes sense
            – user2896120
            Nov 8 at 7:10










          • Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on product_no in all the cases as well.Whether j is a constant and only for j you want the functionality j = *j or it's just an example and it should work for any character. Also what happens if you have a match with *j and also with **, both should be taken ? etc.
            – Grisha Weintraub
            Nov 8 at 7:28












          • only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
            – user2896120
            Nov 8 at 15:49




















          • Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code, *j or **. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j If *j, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with ** just like the result. Please let me know if this makes sense
            – user2896120
            Nov 8 at 7:10










          • Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on product_no in all the cases as well.Whether j is a constant and only for j you want the functionality j = *j or it's just an example and it should work for any character. Also what happens if you have a match with *j and also with **, both should be taken ? etc.
            – Grisha Weintraub
            Nov 8 at 7:28












          • only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
            – user2896120
            Nov 8 at 15:49


















          Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code, *j or **. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j If *j, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with ** just like the result. Please let me know if this makes sense
          – user2896120
          Nov 8 at 7:10




          Hmm sorry for the confusion. In the second dataframe that I have it can either contain the actual 2 letter code, *j or **. I'd like to join all the 2 letter codes present in the first dataframe. with the 2 letter codes in the second dataframe. If the 2 letter code is not present in the second dataframe, then we check if the 2 letter code in the first dataframe ends with "j" if it does, then we join it with *j If *j, Lastly, if the code in the first dataframe is not present in the second dataframe then we join it with ** just like the result. Please let me know if this makes sense
          – user2896120
          Nov 8 at 7:10












          Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on product_no in all the cases as well.Whether j is a constant and only for j you want the functionality j = *j or it's just an example and it should work for any character. Also what happens if you have a match with *j and also with **, both should be taken ? etc.
          – Grisha Weintraub
          Nov 8 at 7:28






          Well, it's still a bit messy. I suggest you to update the question with clear requirements and an example that reflects them. For example it's not clear whether you want to join on product_no in all the cases as well.Whether j is a constant and only for j you want the functionality j = *j or it's just an example and it should work for any character. Also what happens if you have a match with *j and also with **, both should be taken ? etc.
          – Grisha Weintraub
          Nov 8 at 7:28














          only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
          – user2896120
          Nov 8 at 15:49






          only for j and, if we have a match with *j only *j should be taken. I updated the question with more details
          – user2896120
          Nov 8 at 15:49




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53201741%2fjoining-dataframes-with-same-last-letter-as-condition%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini