dynamically generate df.select statement from json schema in spark
i am selecting columns from wide string with offsets provided like below
df2 = df.select( substring(col("a"), 4, 6).as("c")).cast(IntegerType)
But i have to extract 1000 columns out of string, how can i generate select statement with JSON sparkstruct schema, if I can provide details like column name , datatype,width,start and end position.
Also I have to cast few columns to IntergerType or Longtype but i observed these fields getting truncated with casting like 
111111111 will be converted to 1 when casted to IntegerType
scala apache-spark hadoop bigdata
add a comment |
i am selecting columns from wide string with offsets provided like below
df2 = df.select( substring(col("a"), 4, 6).as("c")).cast(IntegerType)
But i have to extract 1000 columns out of string, how can i generate select statement with JSON sparkstruct schema, if I can provide details like column name , datatype,width,start and end position.
Also I have to cast few columns to IntergerType or Longtype but i observed these fields getting truncated with casting like 
111111111 will be converted to 1 when casted to IntegerType
scala apache-spark hadoop bigdata
 
 
 
 
 
 
 Nested or just a string?
 – thebluephantom
 Oct 1 '18 at 19:34
 
 
 
 
 
 
 
 
 
 i want to create dynamic statement , nested or string both are fine. I want to read the column name and offsets from JSON schema
 – user10438333
 Oct 1 '18 at 19:37
 
 
 
 
 
 
 
 
 
 So you would need to do some exploding. Not sure how counting approach works without exploding. Will try.
 – thebluephantom
 Oct 1 '18 at 19:39
 
 
 
add a comment |
i am selecting columns from wide string with offsets provided like below
df2 = df.select( substring(col("a"), 4, 6).as("c")).cast(IntegerType)
But i have to extract 1000 columns out of string, how can i generate select statement with JSON sparkstruct schema, if I can provide details like column name , datatype,width,start and end position.
Also I have to cast few columns to IntergerType or Longtype but i observed these fields getting truncated with casting like 
111111111 will be converted to 1 when casted to IntegerType
scala apache-spark hadoop bigdata
i am selecting columns from wide string with offsets provided like below
df2 = df.select( substring(col("a"), 4, 6).as("c")).cast(IntegerType)
But i have to extract 1000 columns out of string, how can i generate select statement with JSON sparkstruct schema, if I can provide details like column name , datatype,width,start and end position.
Also I have to cast few columns to IntergerType or Longtype but i observed these fields getting truncated with casting like 
111111111 will be converted to 1 when casted to IntegerType
scala apache-spark hadoop bigdata
scala apache-spark hadoop bigdata
asked Oct 1 '18 at 19:14
user10438333user10438333
93
93
 
 
 
 
 
 
 Nested or just a string?
 – thebluephantom
 Oct 1 '18 at 19:34
 
 
 
 
 
 
 
 
 
 i want to create dynamic statement , nested or string both are fine. I want to read the column name and offsets from JSON schema
 – user10438333
 Oct 1 '18 at 19:37
 
 
 
 
 
 
 
 
 
 So you would need to do some exploding. Not sure how counting approach works without exploding. Will try.
 – thebluephantom
 Oct 1 '18 at 19:39
 
 
 
add a comment |
 
 
 
 
 
 
 Nested or just a string?
 – thebluephantom
 Oct 1 '18 at 19:34
 
 
 
 
 
 
 
 
 
 i want to create dynamic statement , nested or string both are fine. I want to read the column name and offsets from JSON schema
 – user10438333
 Oct 1 '18 at 19:37
 
 
 
 
 
 
 
 
 
 So you would need to do some exploding. Not sure how counting approach works without exploding. Will try.
 – thebluephantom
 Oct 1 '18 at 19:39
 
 
 
Nested or just a string?
– thebluephantom
Oct 1 '18 at 19:34
Nested or just a string?
– thebluephantom
Oct 1 '18 at 19:34
i want to create dynamic statement , nested or string both are fine. I want to read the column name and offsets from JSON schema
– user10438333
Oct 1 '18 at 19:37
i want to create dynamic statement , nested or string both are fine. I want to read the column name and offsets from JSON schema
– user10438333
Oct 1 '18 at 19:37
So you would need to do some exploding. Not sure how counting approach works without exploding. Will try.
– thebluephantom
Oct 1 '18 at 19:39
So you would need to do some exploding. Not sure how counting approach works without exploding. Will try.
– thebluephantom
Oct 1 '18 at 19:39
add a comment |
                                1 Answer
                            1
                        
active
oldest
votes
If you can get your json into string using configfactory
its just a 3 step process
val config = ConfigFactory.parseFile(new File(configFile))
val jsonColumns = config.getString("name.location")
val jsonColumnsArr = jsonColumns.split(",")
val mappedColNames = jsonColumnsArr.map(name => col(name))
df.select(mappedColNames: _*)
NOTE: 
1: configFile can be the string you can get from the arguments 
2: name and location are the json objects which points out to your column names
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52597463%2fdynamically-generate-df-select-statement-from-json-schema-in-spark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
                                1 Answer
                            1
                        
active
oldest
votes
                                1 Answer
                            1
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you can get your json into string using configfactory
its just a 3 step process
val config = ConfigFactory.parseFile(new File(configFile))
val jsonColumns = config.getString("name.location")
val jsonColumnsArr = jsonColumns.split(",")
val mappedColNames = jsonColumnsArr.map(name => col(name))
df.select(mappedColNames: _*)
NOTE: 
1: configFile can be the string you can get from the arguments 
2: name and location are the json objects which points out to your column names
add a comment |
If you can get your json into string using configfactory
its just a 3 step process
val config = ConfigFactory.parseFile(new File(configFile))
val jsonColumns = config.getString("name.location")
val jsonColumnsArr = jsonColumns.split(",")
val mappedColNames = jsonColumnsArr.map(name => col(name))
df.select(mappedColNames: _*)
NOTE: 
1: configFile can be the string you can get from the arguments 
2: name and location are the json objects which points out to your column names
add a comment |
If you can get your json into string using configfactory
its just a 3 step process
val config = ConfigFactory.parseFile(new File(configFile))
val jsonColumns = config.getString("name.location")
val jsonColumnsArr = jsonColumns.split(",")
val mappedColNames = jsonColumnsArr.map(name => col(name))
df.select(mappedColNames: _*)
NOTE: 
1: configFile can be the string you can get from the arguments 
2: name and location are the json objects which points out to your column names
If you can get your json into string using configfactory
its just a 3 step process
val config = ConfigFactory.parseFile(new File(configFile))
val jsonColumns = config.getString("name.location")
val jsonColumnsArr = jsonColumns.split(",")
val mappedColNames = jsonColumnsArr.map(name => col(name))
df.select(mappedColNames: _*)
NOTE: 
1: configFile can be the string you can get from the arguments 
2: name and location are the json objects which points out to your column names
answered Nov 12 '18 at 21:58


Sri GovindSri Govind
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52597463%2fdynamically-generate-df-select-statement-from-json-schema-in-spark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Nested or just a string?
– thebluephantom
Oct 1 '18 at 19:34
i want to create dynamic statement , nested or string both are fine. I want to read the column name and offsets from JSON schema
– user10438333
Oct 1 '18 at 19:37
So you would need to do some exploding. Not sure how counting approach works without exploding. Will try.
– thebluephantom
Oct 1 '18 at 19:39