Hive: How to deal with files that comprise unfixed number of fields?












0















Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:



uid1, eid01, para1, para2, para3,para4,para5,timestamp
uid1, eid12, para56, para57, timestamp
uid3, eid42, para102,timestamp


The number of fields is not fixed.



Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:



  uid    eid              para                  datatime
uid1 eid01 para1, para2, para3,para4,para5 timestamp
uid1 eid12 para56, para57 timestamp
uid3 eid42 para102 timestamp


The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?



Any help is appreciated.










share|improve this question





























    0















    Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:



    uid1, eid01, para1, para2, para3,para4,para5,timestamp
    uid1, eid12, para56, para57, timestamp
    uid3, eid42, para102,timestamp


    The number of fields is not fixed.



    Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:



      uid    eid              para                  datatime
    uid1 eid01 para1, para2, para3,para4,para5 timestamp
    uid1 eid12 para56, para57 timestamp
    uid3 eid42 para102 timestamp


    The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?



    Any help is appreciated.










    share|improve this question



























      0












      0








      0








      Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:



      uid1, eid01, para1, para2, para3,para4,para5,timestamp
      uid1, eid12, para56, para57, timestamp
      uid3, eid42, para102,timestamp


      The number of fields is not fixed.



      Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:



        uid    eid              para                  datatime
      uid1 eid01 para1, para2, para3,para4,para5 timestamp
      uid1 eid12 para56, para57 timestamp
      uid3 eid42 para102 timestamp


      The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?



      Any help is appreciated.










      share|improve this question
















      Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:



      uid1, eid01, para1, para2, para3,para4,para5,timestamp
      uid1, eid12, para56, para57, timestamp
      uid3, eid42, para102,timestamp


      The number of fields is not fixed.



      Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:



        uid    eid              para                  datatime
      uid1 eid01 para1, para2, para3,para4,para5 timestamp
      uid1 eid12 para56, para57 timestamp
      uid3 eid42 para102 timestamp


      The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?



      Any help is appreciated.







      hadoop hive






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 8:44







      user2894829

















      asked Nov 22 '18 at 8:38









      user2894829user2894829

      998




      998
























          1 Answer
          1






          active

          oldest

          votes


















          0















          1. create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.


          2. create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp



          when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)






          share|improve this answer
























          • well thanks for your reply. I think split() and reverse() can achieve my target.

            – user2894829
            Nov 22 '18 at 9:17











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426830%2fhive-how-to-deal-with-files-that-comprise-unfixed-number-of-fields%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0















          1. create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.


          2. create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp



          when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)






          share|improve this answer
























          • well thanks for your reply. I think split() and reverse() can achieve my target.

            – user2894829
            Nov 22 '18 at 9:17
















          0















          1. create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.


          2. create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp



          when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)






          share|improve this answer
























          • well thanks for your reply. I think split() and reverse() can achieve my target.

            – user2894829
            Nov 22 '18 at 9:17














          0












          0








          0








          1. create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.


          2. create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp



          when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)






          share|improve this answer














          1. create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.


          2. create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp



          when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 8:51









          TomTom

          1,5311335




          1,5311335













          • well thanks for your reply. I think split() and reverse() can achieve my target.

            – user2894829
            Nov 22 '18 at 9:17



















          • well thanks for your reply. I think split() and reverse() can achieve my target.

            – user2894829
            Nov 22 '18 at 9:17

















          well thanks for your reply. I think split() and reverse() can achieve my target.

          – user2894829
          Nov 22 '18 at 9:17





          well thanks for your reply. I think split() and reverse() can achieve my target.

          – user2894829
          Nov 22 '18 at 9:17




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426830%2fhive-how-to-deal-with-files-that-comprise-unfixed-number-of-fields%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini