Convert all the values of PySpark DF into Int
up vote
-1
down vote
favorite
I have a dataframe in pyspark, which looks like below:

The datatype of all the col except 'bureau_cibil_account_segment' is 'string'. I want to convert them into 'int'.
Below is the code snippet I tried in PySpark.
ph_num = ph.select([col(c).cast("int").alias(c) for c in ph.columns])
Although the data type gets changed to int, dataframe is not exactly the same.

Can someone help me with this?
NOTE: I do not want to explicitly cast each col as the number of cols are dynamic.
python-3.x pyspark apache-spark-2.0
add a comment |
up vote
-1
down vote
favorite
I have a dataframe in pyspark, which looks like below:

The datatype of all the col except 'bureau_cibil_account_segment' is 'string'. I want to convert them into 'int'.
Below is the code snippet I tried in PySpark.
ph_num = ph.select([col(c).cast("int").alias(c) for c in ph.columns])
Although the data type gets changed to int, dataframe is not exactly the same.

Can someone help me with this?
NOTE: I do not want to explicitly cast each col as the number of cols are dynamic.
python-3.x pyspark apache-spark-2.0
You mean you want to convert columns type tointand still keep all zeros before numbers in the columns?
– Ali AzG
Nov 8 at 9:07
@AliAzG: Yes that's what I'm looking for.
– Sumit
Nov 8 at 9:30
I don't know exactly what is the best solution to do this. but I googled this problem and it seemed impossible to do such a thing.
– Ali AzG
Nov 8 at 9:32
1
An integer will convey a numeric value with a unique representation (0 is the same as 00 or 000, so why display more numbers?) that cannot encode what you want. If you need the integers to do, for example, some processing, do the casting, then the processing, then cast back to string and add some padding (see this: stackoverflow.com/a/45401902/2628463).
– martinarroyo
Nov 8 at 10:53
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I have a dataframe in pyspark, which looks like below:

The datatype of all the col except 'bureau_cibil_account_segment' is 'string'. I want to convert them into 'int'.
Below is the code snippet I tried in PySpark.
ph_num = ph.select([col(c).cast("int").alias(c) for c in ph.columns])
Although the data type gets changed to int, dataframe is not exactly the same.

Can someone help me with this?
NOTE: I do not want to explicitly cast each col as the number of cols are dynamic.
python-3.x pyspark apache-spark-2.0
I have a dataframe in pyspark, which looks like below:

The datatype of all the col except 'bureau_cibil_account_segment' is 'string'. I want to convert them into 'int'.
Below is the code snippet I tried in PySpark.
ph_num = ph.select([col(c).cast("int").alias(c) for c in ph.columns])
Although the data type gets changed to int, dataframe is not exactly the same.

Can someone help me with this?
NOTE: I do not want to explicitly cast each col as the number of cols are dynamic.
python-3.x pyspark apache-spark-2.0
python-3.x pyspark apache-spark-2.0
asked Nov 8 at 8:42
Sumit
3671315
3671315
You mean you want to convert columns type tointand still keep all zeros before numbers in the columns?
– Ali AzG
Nov 8 at 9:07
@AliAzG: Yes that's what I'm looking for.
– Sumit
Nov 8 at 9:30
I don't know exactly what is the best solution to do this. but I googled this problem and it seemed impossible to do such a thing.
– Ali AzG
Nov 8 at 9:32
1
An integer will convey a numeric value with a unique representation (0 is the same as 00 or 000, so why display more numbers?) that cannot encode what you want. If you need the integers to do, for example, some processing, do the casting, then the processing, then cast back to string and add some padding (see this: stackoverflow.com/a/45401902/2628463).
– martinarroyo
Nov 8 at 10:53
add a comment |
You mean you want to convert columns type tointand still keep all zeros before numbers in the columns?
– Ali AzG
Nov 8 at 9:07
@AliAzG: Yes that's what I'm looking for.
– Sumit
Nov 8 at 9:30
I don't know exactly what is the best solution to do this. but I googled this problem and it seemed impossible to do such a thing.
– Ali AzG
Nov 8 at 9:32
1
An integer will convey a numeric value with a unique representation (0 is the same as 00 or 000, so why display more numbers?) that cannot encode what you want. If you need the integers to do, for example, some processing, do the casting, then the processing, then cast back to string and add some padding (see this: stackoverflow.com/a/45401902/2628463).
– martinarroyo
Nov 8 at 10:53
You mean you want to convert columns type to
int and still keep all zeros before numbers in the columns?– Ali AzG
Nov 8 at 9:07
You mean you want to convert columns type to
int and still keep all zeros before numbers in the columns?– Ali AzG
Nov 8 at 9:07
@AliAzG: Yes that's what I'm looking for.
– Sumit
Nov 8 at 9:30
@AliAzG: Yes that's what I'm looking for.
– Sumit
Nov 8 at 9:30
I don't know exactly what is the best solution to do this. but I googled this problem and it seemed impossible to do such a thing.
– Ali AzG
Nov 8 at 9:32
I don't know exactly what is the best solution to do this. but I googled this problem and it seemed impossible to do such a thing.
– Ali AzG
Nov 8 at 9:32
1
1
An integer will convey a numeric value with a unique representation (0 is the same as 00 or 000, so why display more numbers?) that cannot encode what you want. If you need the integers to do, for example, some processing, do the casting, then the processing, then cast back to string and add some padding (see this: stackoverflow.com/a/45401902/2628463).
– martinarroyo
Nov 8 at 10:53
An integer will convey a numeric value with a unique representation (0 is the same as 00 or 000, so why display more numbers?) that cannot encode what you want. If you need the integers to do, for example, some processing, do the casting, then the processing, then cast back to string and add some padding (see this: stackoverflow.com/a/45401902/2628463).
– martinarroyo
Nov 8 at 10:53
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53204128%2fconvert-all-the-values-of-pyspark-df-into-int%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You mean you want to convert columns type to
intand still keep all zeros before numbers in the columns?– Ali AzG
Nov 8 at 9:07
@AliAzG: Yes that's what I'm looking for.
– Sumit
Nov 8 at 9:30
I don't know exactly what is the best solution to do this. but I googled this problem and it seemed impossible to do such a thing.
– Ali AzG
Nov 8 at 9:32
1
An integer will convey a numeric value with a unique representation (0 is the same as 00 or 000, so why display more numbers?) that cannot encode what you want. If you need the integers to do, for example, some processing, do the casting, then the processing, then cast back to string and add some padding (see this: stackoverflow.com/a/45401902/2628463).
– martinarroyo
Nov 8 at 10:53