Can we load text file separated by :: into hive table?











up vote
1
down vote

favorite












Is there a way to load a simple text file where fields are separated by "::" into hive table other than replacing those "::" with "," and then load it?
Replacing the "::" with "," is quicker when the text file is small but what if contains millions of records?










share|improve this question


























    up vote
    1
    down vote

    favorite












    Is there a way to load a simple text file where fields are separated by "::" into hive table other than replacing those "::" with "," and then load it?
    Replacing the "::" with "," is quicker when the text file is small but what if contains millions of records?










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Is there a way to load a simple text file where fields are separated by "::" into hive table other than replacing those "::" with "," and then load it?
      Replacing the "::" with "," is quicker when the text file is small but what if contains millions of records?










      share|improve this question













      Is there a way to load a simple text file where fields are separated by "::" into hive table other than replacing those "::" with "," and then load it?
      Replacing the "::" with "," is quicker when the text file is small but what if contains millions of records?







      hive






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 7 at 18:53









      VIN

      11111




      11111
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          Try creating Hive table using Regex serde



          Example:



          i had file with below text in it.



          i::90
          w::99


          Create Hive table:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE;


          Select from Hive table:



          hive> select * from i;
          +-------+---------+--+
          | i.id | i.name |
          +-------+---------+--+
          | i | 90 |
          | w | 99 |
          +-------+---------+--+


          In case if you want to skip the header then use below syntax:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE
          tblproperties ('skip.header.line.count'='1');



          UPDATE:




          Check is there any older files in your table location.if some files are there then delete them(if you don't want them).



          1.Create Hive table as:



          create external table <db_name>.<table_name>
          (col1 STRING,
          col2 STRING,
          col3 string,
          col4 string
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*?)::(.*?)::(.*)')
          STORED AS TEXTFILE;


          2.Then run:



          load data local inpath 'Source path' overwrite into table 'Destination table'





          share|improve this answer























          • It gives me error ED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.regexserde
            – VIN
            Nov 7 at 20:07












          • @vicky, regexserde is case sensitive use this RegexSerDe. Try to create hive table again.
            – Shu
            Nov 7 at 20:17










          • Ok, I did that. But this time my table got created but when I load the data and query the table it returns null for all fields which is basically the same I was getting earlier
            – VIN
            Nov 7 at 20:37










          • @vicky,what do u mean by load the data?,Create an external table and copy the file into the table location.. Make sure your regex is correct and able to capture the groups correctly.. if you are still having issues Update the question with some sample data..!!
            – Shu
            Nov 7 at 20:56












          • loading the data means ( load data local inpath 'Source path' overwrite into table 'Destination table') and sample data looks like this (1::914::3::978301968 1::3408::4::978300275 1::2355::5::978824291 1::1197::3::978302268 1::1287::5::978302039 1::2804::5::978300719 1::594::4::978302268 1::919::4::978301368 1::595::5::978824268)
            – VIN
            Nov 7 at 21:20













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195963%2fcan-we-load-text-file-separated-by-into-hive-table%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote



          accepted










          Try creating Hive table using Regex serde



          Example:



          i had file with below text in it.



          i::90
          w::99


          Create Hive table:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE;


          Select from Hive table:



          hive> select * from i;
          +-------+---------+--+
          | i.id | i.name |
          +-------+---------+--+
          | i | 90 |
          | w | 99 |
          +-------+---------+--+


          In case if you want to skip the header then use below syntax:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE
          tblproperties ('skip.header.line.count'='1');



          UPDATE:




          Check is there any older files in your table location.if some files are there then delete them(if you don't want them).



          1.Create Hive table as:



          create external table <db_name>.<table_name>
          (col1 STRING,
          col2 STRING,
          col3 string,
          col4 string
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*?)::(.*?)::(.*)')
          STORED AS TEXTFILE;


          2.Then run:



          load data local inpath 'Source path' overwrite into table 'Destination table'





          share|improve this answer























          • It gives me error ED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.regexserde
            – VIN
            Nov 7 at 20:07












          • @vicky, regexserde is case sensitive use this RegexSerDe. Try to create hive table again.
            – Shu
            Nov 7 at 20:17










          • Ok, I did that. But this time my table got created but when I load the data and query the table it returns null for all fields which is basically the same I was getting earlier
            – VIN
            Nov 7 at 20:37










          • @vicky,what do u mean by load the data?,Create an external table and copy the file into the table location.. Make sure your regex is correct and able to capture the groups correctly.. if you are still having issues Update the question with some sample data..!!
            – Shu
            Nov 7 at 20:56












          • loading the data means ( load data local inpath 'Source path' overwrite into table 'Destination table') and sample data looks like this (1::914::3::978301968 1::3408::4::978300275 1::2355::5::978824291 1::1197::3::978302268 1::1287::5::978302039 1::2804::5::978300719 1::594::4::978302268 1::919::4::978301368 1::595::5::978824268)
            – VIN
            Nov 7 at 21:20

















          up vote
          0
          down vote



          accepted










          Try creating Hive table using Regex serde



          Example:



          i had file with below text in it.



          i::90
          w::99


          Create Hive table:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE;


          Select from Hive table:



          hive> select * from i;
          +-------+---------+--+
          | i.id | i.name |
          +-------+---------+--+
          | i | 90 |
          | w | 99 |
          +-------+---------+--+


          In case if you want to skip the header then use below syntax:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE
          tblproperties ('skip.header.line.count'='1');



          UPDATE:




          Check is there any older files in your table location.if some files are there then delete them(if you don't want them).



          1.Create Hive table as:



          create external table <db_name>.<table_name>
          (col1 STRING,
          col2 STRING,
          col3 string,
          col4 string
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*?)::(.*?)::(.*)')
          STORED AS TEXTFILE;


          2.Then run:



          load data local inpath 'Source path' overwrite into table 'Destination table'





          share|improve this answer























          • It gives me error ED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.regexserde
            – VIN
            Nov 7 at 20:07












          • @vicky, regexserde is case sensitive use this RegexSerDe. Try to create hive table again.
            – Shu
            Nov 7 at 20:17










          • Ok, I did that. But this time my table got created but when I load the data and query the table it returns null for all fields which is basically the same I was getting earlier
            – VIN
            Nov 7 at 20:37










          • @vicky,what do u mean by load the data?,Create an external table and copy the file into the table location.. Make sure your regex is correct and able to capture the groups correctly.. if you are still having issues Update the question with some sample data..!!
            – Shu
            Nov 7 at 20:56












          • loading the data means ( load data local inpath 'Source path' overwrite into table 'Destination table') and sample data looks like this (1::914::3::978301968 1::3408::4::978300275 1::2355::5::978824291 1::1197::3::978302268 1::1287::5::978302039 1::2804::5::978300719 1::594::4::978302268 1::919::4::978301368 1::595::5::978824268)
            – VIN
            Nov 7 at 21:20















          up vote
          0
          down vote



          accepted







          up vote
          0
          down vote



          accepted






          Try creating Hive table using Regex serde



          Example:



          i had file with below text in it.



          i::90
          w::99


          Create Hive table:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE;


          Select from Hive table:



          hive> select * from i;
          +-------+---------+--+
          | i.id | i.name |
          +-------+---------+--+
          | i | 90 |
          | w | 99 |
          +-------+---------+--+


          In case if you want to skip the header then use below syntax:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE
          tblproperties ('skip.header.line.count'='1');



          UPDATE:




          Check is there any older files in your table location.if some files are there then delete them(if you don't want them).



          1.Create Hive table as:



          create external table <db_name>.<table_name>
          (col1 STRING,
          col2 STRING,
          col3 string,
          col4 string
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*?)::(.*?)::(.*)')
          STORED AS TEXTFILE;


          2.Then run:



          load data local inpath 'Source path' overwrite into table 'Destination table'





          share|improve this answer














          Try creating Hive table using Regex serde



          Example:



          i had file with below text in it.



          i::90
          w::99


          Create Hive table:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE;


          Select from Hive table:



          hive> select * from i;
          +-------+---------+--+
          | i.id | i.name |
          +-------+---------+--+
          | i | 90 |
          | w | 99 |
          +-------+---------+--+


          In case if you want to skip the header then use below syntax:



          hive> create external table default.i
          (Id STRING,
          Name STRING
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*)')
          STORED AS TEXTFILE
          tblproperties ('skip.header.line.count'='1');



          UPDATE:




          Check is there any older files in your table location.if some files are there then delete them(if you don't want them).



          1.Create Hive table as:



          create external table <db_name>.<table_name>
          (col1 STRING,
          col2 STRING,
          col3 string,
          col4 string
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
          WITH SERDEPROPERTIES ('input.regex' = '(.*?)::(.*?)::(.*?)::(.*)')
          STORED AS TEXTFILE;


          2.Then run:



          load data local inpath 'Source path' overwrite into table 'Destination table'






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 7 at 22:09

























          answered Nov 7 at 19:12









          Shu

          3,8712418




          3,8712418












          • It gives me error ED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.regexserde
            – VIN
            Nov 7 at 20:07












          • @vicky, regexserde is case sensitive use this RegexSerDe. Try to create hive table again.
            – Shu
            Nov 7 at 20:17










          • Ok, I did that. But this time my table got created but when I load the data and query the table it returns null for all fields which is basically the same I was getting earlier
            – VIN
            Nov 7 at 20:37










          • @vicky,what do u mean by load the data?,Create an external table and copy the file into the table location.. Make sure your regex is correct and able to capture the groups correctly.. if you are still having issues Update the question with some sample data..!!
            – Shu
            Nov 7 at 20:56












          • loading the data means ( load data local inpath 'Source path' overwrite into table 'Destination table') and sample data looks like this (1::914::3::978301968 1::3408::4::978300275 1::2355::5::978824291 1::1197::3::978302268 1::1287::5::978302039 1::2804::5::978300719 1::594::4::978302268 1::919::4::978301368 1::595::5::978824268)
            – VIN
            Nov 7 at 21:20




















          • It gives me error ED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.regexserde
            – VIN
            Nov 7 at 20:07












          • @vicky, regexserde is case sensitive use this RegexSerDe. Try to create hive table again.
            – Shu
            Nov 7 at 20:17










          • Ok, I did that. But this time my table got created but when I load the data and query the table it returns null for all fields which is basically the same I was getting earlier
            – VIN
            Nov 7 at 20:37










          • @vicky,what do u mean by load the data?,Create an external table and copy the file into the table location.. Make sure your regex is correct and able to capture the groups correctly.. if you are still having issues Update the question with some sample data..!!
            – Shu
            Nov 7 at 20:56












          • loading the data means ( load data local inpath 'Source path' overwrite into table 'Destination table') and sample data looks like this (1::914::3::978301968 1::3408::4::978300275 1::2355::5::978824291 1::1197::3::978302268 1::1287::5::978302039 1::2804::5::978300719 1::594::4::978302268 1::919::4::978301368 1::595::5::978824268)
            – VIN
            Nov 7 at 21:20


















          It gives me error ED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.regexserde
          – VIN
          Nov 7 at 20:07






          It gives me error ED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.regexserde
          – VIN
          Nov 7 at 20:07














          @vicky, regexserde is case sensitive use this RegexSerDe. Try to create hive table again.
          – Shu
          Nov 7 at 20:17




          @vicky, regexserde is case sensitive use this RegexSerDe. Try to create hive table again.
          – Shu
          Nov 7 at 20:17












          Ok, I did that. But this time my table got created but when I load the data and query the table it returns null for all fields which is basically the same I was getting earlier
          – VIN
          Nov 7 at 20:37




          Ok, I did that. But this time my table got created but when I load the data and query the table it returns null for all fields which is basically the same I was getting earlier
          – VIN
          Nov 7 at 20:37












          @vicky,what do u mean by load the data?,Create an external table and copy the file into the table location.. Make sure your regex is correct and able to capture the groups correctly.. if you are still having issues Update the question with some sample data..!!
          – Shu
          Nov 7 at 20:56






          @vicky,what do u mean by load the data?,Create an external table and copy the file into the table location.. Make sure your regex is correct and able to capture the groups correctly.. if you are still having issues Update the question with some sample data..!!
          – Shu
          Nov 7 at 20:56














          loading the data means ( load data local inpath 'Source path' overwrite into table 'Destination table') and sample data looks like this (1::914::3::978301968 1::3408::4::978300275 1::2355::5::978824291 1::1197::3::978302268 1::1287::5::978302039 1::2804::5::978300719 1::594::4::978302268 1::919::4::978301368 1::595::5::978824268)
          – VIN
          Nov 7 at 21:20






          loading the data means ( load data local inpath 'Source path' overwrite into table 'Destination table') and sample data looks like this (1::914::3::978301968 1::3408::4::978300275 1::2355::5::978824291 1::1197::3::978302268 1::1287::5::978302039 1::2804::5::978300719 1::594::4::978302268 1::919::4::978301368 1::595::5::978824268)
          – VIN
          Nov 7 at 21:20




















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195963%2fcan-we-load-text-file-separated-by-into-hive-table%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Academy of Television Arts & Sciences

          L'Équipe

          1995 France bombings