Using Spark how to add column at the end

-1

Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided

`sal >= 1000 && sal <=2000  = Level 1

sal > 2000 && sal <= 3000  = Level 2 

sal >3000 && sal <=4000 = Level 3



+-----+-------+----+----+

|empid|empName| sal|dept|

+-----+-------+----+----+

|  100|  EMP1 |1000|IT  |

|  101|  EMP2 |2500|ITES|

|  102|  EMP3 |3000|BPO |

|  104|  EMP4 |4000|ENGG|

+-----+-------+----+----+`

Output

+-----+-------+----+----+-----+

|empid|empName| sal|dept|Level|

+-----+-------+----+----+-----+

|  100|  EMP1 |1000|IT  |Level 1|

|  101|  EMP2 |2500|ITES|Level 2|

|  102|  EMP3 |3000|BPO |Level 3|

|  104|  EMP4 |4000|ENGG|Level 3|

+-----+-------+----+----+-----+

I have written below code -

case class mySchema(empid: Int, empName: String, sal: Int, post: String) import spark.implicits._ val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post")) val df1 = rdd1.toDF() val dfTods = df1.as[mySchema] dfTods.createTempView("Employee") val resDS = spark.sql("""select * case when (sal === 1000) then 'ASE' when (sal === 2000) then 'SE' else 'SSE' end as level from Employee""")

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)

== SQL ==

select  *   case when (sal === 1000) then 'ASE' 

----------------------------------------------------------------------^^^

                                                                 when (sal === 2000) then 'SE' 

                                                                 else 'SSE'  

                                                                 end as  level  from Employee

edited Nov 11 at 9:33

asked Nov 11 at 6:27

Rahul Wagh

287

What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31

1

Missing , between * and case.
– user10465355
Nov 11 at 21:44

@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51

Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43

add a comment |

-1

Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided

`sal >= 1000 && sal <=2000  = Level 1

sal > 2000 && sal <= 3000  = Level 2 

sal >3000 && sal <=4000 = Level 3



+-----+-------+----+----+

|empid|empName| sal|dept|

+-----+-------+----+----+

|  100|  EMP1 |1000|IT  |

|  101|  EMP2 |2500|ITES|

|  102|  EMP3 |3000|BPO |

|  104|  EMP4 |4000|ENGG|

+-----+-------+----+----+`

Output

+-----+-------+----+----+-----+

|empid|empName| sal|dept|Level|

+-----+-------+----+----+-----+

|  100|  EMP1 |1000|IT  |Level 1|

|  101|  EMP2 |2500|ITES|Level 2|

|  102|  EMP3 |3000|BPO |Level 3|

|  104|  EMP4 |4000|ENGG|Level 3|

+-----+-------+----+----+-----+

I have written below code -

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)

== SQL ==

select  *   case when (sal === 1000) then 'ASE' 

----------------------------------------------------------------------^^^

                                                                 when (sal === 2000) then 'SE' 

                                                                 else 'SSE'  

                                                                 end as  level  from Employee

edited Nov 11 at 9:33

asked Nov 11 at 6:27

Rahul Wagh

287

What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31

1

Missing , between * and case.
– user10465355
Nov 11 at 21:44

@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51

Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43

add a comment |

-1

Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided

`sal >= 1000 && sal <=2000  = Level 1

sal > 2000 && sal <= 3000  = Level 2 

sal >3000 && sal <=4000 = Level 3



+-----+-------+----+----+

|empid|empName| sal|dept|

+-----+-------+----+----+

|  100|  EMP1 |1000|IT  |

|  101|  EMP2 |2500|ITES|

|  102|  EMP3 |3000|BPO |

|  104|  EMP4 |4000|ENGG|

+-----+-------+----+----+`

Output

+-----+-------+----+----+-----+

|empid|empName| sal|dept|Level|

+-----+-------+----+----+-----+

|  100|  EMP1 |1000|IT  |Level 1|

|  101|  EMP2 |2500|ITES|Level 2|

|  102|  EMP3 |3000|BPO |Level 3|

|  104|  EMP4 |4000|ENGG|Level 3|

+-----+-------+----+----+-----+

I have written below code -

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)

== SQL ==

select  *   case when (sal === 1000) then 'ASE' 

----------------------------------------------------------------------^^^

                                                                 when (sal === 2000) then 'SE' 

                                                                 else 'SSE'  

                                                                 end as  level  from Employee

edited Nov 11 at 9:33

asked Nov 11 at 6:27

Rahul Wagh

287

Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided

`sal >= 1000 && sal <=2000  = Level 1

sal > 2000 && sal <= 3000  = Level 2 

sal >3000 && sal <=4000 = Level 3



+-----+-------+----+----+

|empid|empName| sal|dept|

+-----+-------+----+----+

|  100|  EMP1 |1000|IT  |

|  101|  EMP2 |2500|ITES|

|  102|  EMP3 |3000|BPO |

|  104|  EMP4 |4000|ENGG|

+-----+-------+----+----+`

Output

+-----+-------+----+----+-----+

|empid|empName| sal|dept|Level|

+-----+-------+----+----+-----+

|  100|  EMP1 |1000|IT  |Level 1|

|  101|  EMP2 |2500|ITES|Level 2|

|  102|  EMP3 |3000|BPO |Level 3|

|  104|  EMP4 |4000|ENGG|Level 3|

+-----+-------+----+----+-----+

I have written below code -

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)

== SQL ==

select  *   case when (sal === 1000) then 'ASE' 

----------------------------------------------------------------------^^^

                                                                 when (sal === 2000) then 'SE' 

                                                                 else 'SSE'  

                                                                 end as  level  from Employee

apache-spark apache-spark-sql

edited Nov 11 at 9:33

asked Nov 11 at 6:27

Rahul Wagh

287

edited Nov 11 at 9:33

asked Nov 11 at 6:27

Rahul Wagh

287

edited Nov 11 at 9:33

asked Nov 11 at 6:27

Rahul Wagh

287

asked Nov 11 at 6:27

Rahul Wagh

287

asked Nov 11 at 6:27

Rahul Wagh

287

What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31

1

Missing , between * and case.
– user10465355
Nov 11 at 21:44

@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51

Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43

add a comment |

What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31

1

Missing , between * and case.
– user10465355
Nov 11 at 21:44

@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51

Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43

What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31

Missing , between * and case.
– user10465355
Nov 11 at 21:44

@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51

Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43

add a comment |

1 Answer
1

active

oldest

votes

select  

*,  

case   

when (sal >=1000 and sal <= 2000) then 'Level 1'  

when (sal > 2000 and sal <= 3000) then 'Level 2'  

when (sal > 3000 and sal <= 4000) then 'Level 3'

end   

as level  

from Employee

edited Nov 16 at 17:01

answered Nov 11 at 10:05

mangusta

1,58921326

instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13

Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19

if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21

mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24

1

@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46

|
show 5 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246387%2fusing-spark-how-to-add-column-at-the-end%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

select  

*,  

case   

when (sal >=1000 and sal <= 2000) then 'Level 1'  

when (sal > 2000 and sal <= 3000) then 'Level 2'  

when (sal > 3000 and sal <= 4000) then 'Level 3'

end   

as level  

from Employee

edited Nov 16 at 17:01

answered Nov 11 at 10:05

mangusta

1,58921326

instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13

Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19

if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21

mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24

1

@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46

|
show 5 more comments

select  

*,  

case   

when (sal >=1000 and sal <= 2000) then 'Level 1'  

when (sal > 2000 and sal <= 3000) then 'Level 2'  

when (sal > 3000 and sal <= 4000) then 'Level 3'

end   

as level  

from Employee

edited Nov 16 at 17:01

answered Nov 11 at 10:05

mangusta

1,58921326

instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13

Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19

if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21

mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24

1

@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46

|
show 5 more comments

select  

*,  

case   

when (sal >=1000 and sal <= 2000) then 'Level 1'  

when (sal > 2000 and sal <= 3000) then 'Level 2'  

when (sal > 3000 and sal <= 4000) then 'Level 3'

end   

as level  

from Employee

edited Nov 16 at 17:01

answered Nov 11 at 10:05

mangusta

1,58921326

select  

*,  

case   

when (sal >=1000 and sal <= 2000) then 'Level 1'  

when (sal > 2000 and sal <= 3000) then 'Level 2'  

when (sal > 3000 and sal <= 4000) then 'Level 3'

end   

as level  

from Employee

edited Nov 16 at 17:01

answered Nov 11 at 10:05

mangusta

1,58921326

edited Nov 16 at 17:01

answered Nov 11 at 10:05

mangusta

1,58921326

answered Nov 11 at 10:05

mangusta

1,58921326

answered Nov 11 at 10:05

mangusta

1,58921326

instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13

Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19

if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21

mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24

1

@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46

|
show 5 more comments

instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13

Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19

if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21

mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24

1

@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46

instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13

Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19

if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21

mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24

@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46

|
show 5 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk