How to replace escaped newline in spark
up vote
0
down vote
favorite
I have a csv, that is not quoted, have added an example below
New lines are escaped with , as shown in the 2nd row, is there a way to replace that with some other character using apache spark..
Input CSV
Banana,23,Male,5,11,2017
Cat,32,Fe
male,2,11,2017
Dragon,28,Male,1,11,2017
Expected Output
Banana,23,Male,5,11,2017
Cat,32,Fe-male,2,11,2017
Dragon,28,Male,1,11,2017
Note: the original file is huge (around 40GB)
Edit 1
I just found an answer to use "sc. wholeTextFiles" instead of "sc.textFile", but given the big size I m not sure if it is memory efficient, please advise
apache-spark pyspark
add a comment |
up vote
0
down vote
favorite
I have a csv, that is not quoted, have added an example below
New lines are escaped with , as shown in the 2nd row, is there a way to replace that with some other character using apache spark..
Input CSV
Banana,23,Male,5,11,2017
Cat,32,Fe
male,2,11,2017
Dragon,28,Male,1,11,2017
Expected Output
Banana,23,Male,5,11,2017
Cat,32,Fe-male,2,11,2017
Dragon,28,Male,1,11,2017
Note: the original file is huge (around 40GB)
Edit 1
I just found an answer to use "sc. wholeTextFiles" instead of "sc.textFile", but given the big size I m not sure if it is memory efficient, please advise
apache-spark pyspark
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a csv, that is not quoted, have added an example below
New lines are escaped with , as shown in the 2nd row, is there a way to replace that with some other character using apache spark..
Input CSV
Banana,23,Male,5,11,2017
Cat,32,Fe
male,2,11,2017
Dragon,28,Male,1,11,2017
Expected Output
Banana,23,Male,5,11,2017
Cat,32,Fe-male,2,11,2017
Dragon,28,Male,1,11,2017
Note: the original file is huge (around 40GB)
Edit 1
I just found an answer to use "sc. wholeTextFiles" instead of "sc.textFile", but given the big size I m not sure if it is memory efficient, please advise
apache-spark pyspark
I have a csv, that is not quoted, have added an example below
New lines are escaped with , as shown in the 2nd row, is there a way to replace that with some other character using apache spark..
Input CSV
Banana,23,Male,5,11,2017
Cat,32,Fe
male,2,11,2017
Dragon,28,Male,1,11,2017
Expected Output
Banana,23,Male,5,11,2017
Cat,32,Fe-male,2,11,2017
Dragon,28,Male,1,11,2017
Note: the original file is huge (around 40GB)
Edit 1
I just found an answer to use "sc. wholeTextFiles" instead of "sc.textFile", but given the big size I m not sure if it is memory efficient, please advise
apache-spark pyspark
apache-spark pyspark
edited Nov 5 at 2:03
asked Nov 5 at 1:42
Geethanadh
327
327
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147280%2fhow-to-replace-escaped-newline-in-spark%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password