Using Spark how to add column at the end












-1














Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided



`sal >= 1000 && sal <=2000  = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3

+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`


Output



+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+


I have written below code -




case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *

case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'

end as level from Employee""")



Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)



== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee









share|improve this question
























  • What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
    – quant
    Nov 11 at 6:31






  • 1




    Missing , between * and case.
    – user10465355
    Nov 11 at 21:44










  • @RahulWagh could you unaccept my answer? I want to delete it because it's wrong
    – mangusta
    Nov 12 at 0:51










  • Unaccepted the answer
    – Rahul Wagh
    Nov 16 at 9:43
















-1














Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided



`sal >= 1000 && sal <=2000  = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3

+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`


Output



+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+


I have written below code -




case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *

case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'

end as level from Employee""")



Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)



== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee









share|improve this question
























  • What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
    – quant
    Nov 11 at 6:31






  • 1




    Missing , between * and case.
    – user10465355
    Nov 11 at 21:44










  • @RahulWagh could you unaccept my answer? I want to delete it because it's wrong
    – mangusta
    Nov 12 at 0:51










  • Unaccepted the answer
    – Rahul Wagh
    Nov 16 at 9:43














-1












-1








-1







Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided



`sal >= 1000 && sal <=2000  = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3

+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`


Output



+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+


I have written below code -




case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *

case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'

end as level from Employee""")



Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)



== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee









share|improve this question















Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided



`sal >= 1000 && sal <=2000  = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3

+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`


Output



+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+


I have written below code -




case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *

case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'

end as level from Employee""")



Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)



== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee






apache-spark apache-spark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 9:33

























asked Nov 11 at 6:27









Rahul Wagh

287




287












  • What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
    – quant
    Nov 11 at 6:31






  • 1




    Missing , between * and case.
    – user10465355
    Nov 11 at 21:44










  • @RahulWagh could you unaccept my answer? I want to delete it because it's wrong
    – mangusta
    Nov 12 at 0:51










  • Unaccepted the answer
    – Rahul Wagh
    Nov 16 at 9:43


















  • What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
    – quant
    Nov 11 at 6:31






  • 1




    Missing , between * and case.
    – user10465355
    Nov 11 at 21:44










  • @RahulWagh could you unaccept my answer? I want to delete it because it's wrong
    – mangusta
    Nov 12 at 0:51










  • Unaccepted the answer
    – Rahul Wagh
    Nov 16 at 9:43
















What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31




What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31




1




1




Missing , between * and case.
– user10465355
Nov 11 at 21:44




Missing , between * and case.
– user10465355
Nov 11 at 21:44












@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51




@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51












Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43




Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43












1 Answer
1






active

oldest

votes


















0














select  
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee





share|improve this answer























  • instead of putting all column names in select can't I use * for all columns rather than typing all the columns
    – Rahul Wagh
    Nov 11 at 10:13










  • Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
    – Rahul Wagh
    Nov 11 at 10:19










  • if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
    – mangusta
    Nov 11 at 10:21










  • mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
    – Rahul Wagh
    Nov 11 at 10:24








  • 1




    @mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
    – cheseaux
    Nov 11 at 20:46











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246387%2fusing-spark-how-to-add-column-at-the-end%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














select  
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee





share|improve this answer























  • instead of putting all column names in select can't I use * for all columns rather than typing all the columns
    – Rahul Wagh
    Nov 11 at 10:13










  • Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
    – Rahul Wagh
    Nov 11 at 10:19










  • if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
    – mangusta
    Nov 11 at 10:21










  • mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
    – Rahul Wagh
    Nov 11 at 10:24








  • 1




    @mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
    – cheseaux
    Nov 11 at 20:46
















0














select  
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee





share|improve this answer























  • instead of putting all column names in select can't I use * for all columns rather than typing all the columns
    – Rahul Wagh
    Nov 11 at 10:13










  • Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
    – Rahul Wagh
    Nov 11 at 10:19










  • if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
    – mangusta
    Nov 11 at 10:21










  • mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
    – Rahul Wagh
    Nov 11 at 10:24








  • 1




    @mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
    – cheseaux
    Nov 11 at 20:46














0












0








0






select  
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee





share|improve this answer














select  
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 at 17:01

























answered Nov 11 at 10:05









mangusta

1,58921326




1,58921326












  • instead of putting all column names in select can't I use * for all columns rather than typing all the columns
    – Rahul Wagh
    Nov 11 at 10:13










  • Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
    – Rahul Wagh
    Nov 11 at 10:19










  • if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
    – mangusta
    Nov 11 at 10:21










  • mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
    – Rahul Wagh
    Nov 11 at 10:24








  • 1




    @mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
    – cheseaux
    Nov 11 at 20:46


















  • instead of putting all column names in select can't I use * for all columns rather than typing all the columns
    – Rahul Wagh
    Nov 11 at 10:13










  • Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
    – Rahul Wagh
    Nov 11 at 10:19










  • if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
    – mangusta
    Nov 11 at 10:21










  • mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
    – Rahul Wagh
    Nov 11 at 10:24








  • 1




    @mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
    – cheseaux
    Nov 11 at 20:46
















instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13




instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13












Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19




Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19












if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21




if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21












mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24






mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24






1




1




@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46




@mangusta Saying that you cannot use * along with a case/when statement is simply wrong. He simply forgot to put a comma , after the * !
– cheseaux
Nov 11 at 20:46


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246387%2fusing-spark-how-to-add-column-at-the-end%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Hercules Kyvelos

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud