Hive: How to deal with files that comprise unfixed number of fields?
Dealing with a file on HDFS
that comprises different num of fields separated by ','. For instance:
uid1, eid01, para1, para2, para3,para4,para5,timestamp
uid1, eid12, para56, para57, timestamp
uid3, eid42, para102,timestamp
The number of fields is not fixed.
Now I want to put these data into a Hive
table that has 4 columns, and all fields of 'para..'
in one colume like:
uid eid para datatime
uid1 eid01 para1, para2, para3,para4,para5 timestamp
uid1 eid12 para56, para57 timestamp
uid3 eid42 para102 timestamp
The data amount is so large that I cannot deal with it using tools like AWK
. Is there any other solution?
Any help is appreciated.
hadoop hive
add a comment |
Dealing with a file on HDFS
that comprises different num of fields separated by ','. For instance:
uid1, eid01, para1, para2, para3,para4,para5,timestamp
uid1, eid12, para56, para57, timestamp
uid3, eid42, para102,timestamp
The number of fields is not fixed.
Now I want to put these data into a Hive
table that has 4 columns, and all fields of 'para..'
in one colume like:
uid eid para datatime
uid1 eid01 para1, para2, para3,para4,para5 timestamp
uid1 eid12 para56, para57 timestamp
uid3 eid42 para102 timestamp
The data amount is so large that I cannot deal with it using tools like AWK
. Is there any other solution?
Any help is appreciated.
hadoop hive
add a comment |
Dealing with a file on HDFS
that comprises different num of fields separated by ','. For instance:
uid1, eid01, para1, para2, para3,para4,para5,timestamp
uid1, eid12, para56, para57, timestamp
uid3, eid42, para102,timestamp
The number of fields is not fixed.
Now I want to put these data into a Hive
table that has 4 columns, and all fields of 'para..'
in one colume like:
uid eid para datatime
uid1 eid01 para1, para2, para3,para4,para5 timestamp
uid1 eid12 para56, para57 timestamp
uid3 eid42 para102 timestamp
The data amount is so large that I cannot deal with it using tools like AWK
. Is there any other solution?
Any help is appreciated.
hadoop hive
Dealing with a file on HDFS
that comprises different num of fields separated by ','. For instance:
uid1, eid01, para1, para2, para3,para4,para5,timestamp
uid1, eid12, para56, para57, timestamp
uid3, eid42, para102,timestamp
The number of fields is not fixed.
Now I want to put these data into a Hive
table that has 4 columns, and all fields of 'para..'
in one colume like:
uid eid para datatime
uid1 eid01 para1, para2, para3,para4,para5 timestamp
uid1 eid12 para56, para57 timestamp
uid3 eid42 para102 timestamp
The data amount is so large that I cannot deal with it using tools like AWK
. Is there any other solution?
Any help is appreciated.
hadoop hive
hadoop hive
edited Nov 22 '18 at 8:44
user2894829
asked Nov 22 '18 at 8:38
user2894829user2894829
998
998
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.
create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp
when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)
well thanks for your reply. I thinksplit()
andreverse()
can achieve my target.
– user2894829
Nov 22 '18 at 9:17
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426830%2fhive-how-to-deal-with-files-that-comprise-unfixed-number-of-fields%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.
create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp
when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)
well thanks for your reply. I thinksplit()
andreverse()
can achieve my target.
– user2894829
Nov 22 '18 at 9:17
add a comment |
create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.
create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp
when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)
well thanks for your reply. I thinksplit()
andreverse()
can achieve my target.
– user2894829
Nov 22 '18 at 9:17
add a comment |
create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.
create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp
when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)
create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.
create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp
when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)
answered Nov 22 '18 at 8:51
TomTom
1,5311335
1,5311335
well thanks for your reply. I thinksplit()
andreverse()
can achieve my target.
– user2894829
Nov 22 '18 at 9:17
add a comment |
well thanks for your reply. I thinksplit()
andreverse()
can achieve my target.
– user2894829
Nov 22 '18 at 9:17
well thanks for your reply. I think
split()
and reverse()
can achieve my target.– user2894829
Nov 22 '18 at 9:17
well thanks for your reply. I think
split()
and reverse()
can achieve my target.– user2894829
Nov 22 '18 at 9:17
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426830%2fhive-how-to-deal-with-files-that-comprise-unfixed-number-of-fields%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown