pandas simpleimputer preserve datatypes
I am facing a simple error with the code below.
My objective is to use simpleimputer to plug missing values of different datatypes in one shot.
When i try to do that, the fit_transform seems to be not work as expected.
When dtype argument is not used, the code works just fine, but the resulting dataframe loses its data type information. When i include the dtype list in the arguments, i am seeing the below error. You should be able to simulate the error by just copying and pasting in your local.
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
import sklearn
print(sklearn.__version__)
0.21.dev0
data = [['Alex','NJ',21,5.10],['Mary','NY',20,np.nan],['Sam',np.nan,np.nan,6.3]]
df = pd.DataFrame(data,columns=['Name','State','Age','Height'])
df.dtypes
Name object
State object
Age float64
Height float64
dtype: object
imp = SimpleImputer(strategy="most_frequent")
#df = pd.DataFrame(imp.fit_transform(df),columns=df.columns) <<<<----- This works just fine
#df
#Name State Age Height
#0 Alex NJ 21 5.1
#1 Mary NY 20 5.1
#2 Sam NJ 20 6.3
#df.dtypes
#Name object
#State object
#Age object
#Height object
#dtype: object
The below statement fails - with the error listed below ( I am trying to preserve dtypes during imputing process)
df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-e9780979921f> in <module>()
7
8 imp = SimpleImputer(strategy="most_frequent")
----> 9 df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
~AppDataLocalContinuumanaconda3libsite-packagespandascoreframe.py in __init__(self, data, index, columns, dtype, copy)
337 data = {}
338 if dtype is not None:
--> 339 dtype = self._validate_dtype(dtype)
340
341 if isinstance(data, DataFrame):
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in _validate_dtype(self, dtype)
166
167 if dtype is not None:
--> 168 dtype = pandas_dtype(dtype)
169
170 # a compound dtype
~AppDataLocalContinuumanaconda3libsite-packagespandascoredtypescommon.py in pandas_dtype(dtype)
2020 # which we safeguard against by catching them earlier and returning
2021 # np.dtype(valid_dtype) before this condition is evaluated.
-> 2022 if dtype in [object, np.object_, 'object', 'O']:
2023 return npdtype
2024 elif npdtype.kind == 'O':
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
pandas
add a comment |
I am facing a simple error with the code below.
My objective is to use simpleimputer to plug missing values of different datatypes in one shot.
When i try to do that, the fit_transform seems to be not work as expected.
When dtype argument is not used, the code works just fine, but the resulting dataframe loses its data type information. When i include the dtype list in the arguments, i am seeing the below error. You should be able to simulate the error by just copying and pasting in your local.
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
import sklearn
print(sklearn.__version__)
0.21.dev0
data = [['Alex','NJ',21,5.10],['Mary','NY',20,np.nan],['Sam',np.nan,np.nan,6.3]]
df = pd.DataFrame(data,columns=['Name','State','Age','Height'])
df.dtypes
Name object
State object
Age float64
Height float64
dtype: object
imp = SimpleImputer(strategy="most_frequent")
#df = pd.DataFrame(imp.fit_transform(df),columns=df.columns) <<<<----- This works just fine
#df
#Name State Age Height
#0 Alex NJ 21 5.1
#1 Mary NY 20 5.1
#2 Sam NJ 20 6.3
#df.dtypes
#Name object
#State object
#Age object
#Height object
#dtype: object
The below statement fails - with the error listed below ( I am trying to preserve dtypes during imputing process)
df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-e9780979921f> in <module>()
7
8 imp = SimpleImputer(strategy="most_frequent")
----> 9 df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
~AppDataLocalContinuumanaconda3libsite-packagespandascoreframe.py in __init__(self, data, index, columns, dtype, copy)
337 data = {}
338 if dtype is not None:
--> 339 dtype = self._validate_dtype(dtype)
340
341 if isinstance(data, DataFrame):
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in _validate_dtype(self, dtype)
166
167 if dtype is not None:
--> 168 dtype = pandas_dtype(dtype)
169
170 # a compound dtype
~AppDataLocalContinuumanaconda3libsite-packagespandascoredtypescommon.py in pandas_dtype(dtype)
2020 # which we safeguard against by catching them earlier and returning
2021 # np.dtype(valid_dtype) before this condition is evaluated.
-> 2022 if dtype in [object, np.object_, 'object', 'O']:
2023 return npdtype
2024 elif npdtype.kind == 'O':
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
pandas
add a comment |
I am facing a simple error with the code below.
My objective is to use simpleimputer to plug missing values of different datatypes in one shot.
When i try to do that, the fit_transform seems to be not work as expected.
When dtype argument is not used, the code works just fine, but the resulting dataframe loses its data type information. When i include the dtype list in the arguments, i am seeing the below error. You should be able to simulate the error by just copying and pasting in your local.
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
import sklearn
print(sklearn.__version__)
0.21.dev0
data = [['Alex','NJ',21,5.10],['Mary','NY',20,np.nan],['Sam',np.nan,np.nan,6.3]]
df = pd.DataFrame(data,columns=['Name','State','Age','Height'])
df.dtypes
Name object
State object
Age float64
Height float64
dtype: object
imp = SimpleImputer(strategy="most_frequent")
#df = pd.DataFrame(imp.fit_transform(df),columns=df.columns) <<<<----- This works just fine
#df
#Name State Age Height
#0 Alex NJ 21 5.1
#1 Mary NY 20 5.1
#2 Sam NJ 20 6.3
#df.dtypes
#Name object
#State object
#Age object
#Height object
#dtype: object
The below statement fails - with the error listed below ( I am trying to preserve dtypes during imputing process)
df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-e9780979921f> in <module>()
7
8 imp = SimpleImputer(strategy="most_frequent")
----> 9 df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
~AppDataLocalContinuumanaconda3libsite-packagespandascoreframe.py in __init__(self, data, index, columns, dtype, copy)
337 data = {}
338 if dtype is not None:
--> 339 dtype = self._validate_dtype(dtype)
340
341 if isinstance(data, DataFrame):
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in _validate_dtype(self, dtype)
166
167 if dtype is not None:
--> 168 dtype = pandas_dtype(dtype)
169
170 # a compound dtype
~AppDataLocalContinuumanaconda3libsite-packagespandascoredtypescommon.py in pandas_dtype(dtype)
2020 # which we safeguard against by catching them earlier and returning
2021 # np.dtype(valid_dtype) before this condition is evaluated.
-> 2022 if dtype in [object, np.object_, 'object', 'O']:
2023 return npdtype
2024 elif npdtype.kind == 'O':
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
pandas
I am facing a simple error with the code below.
My objective is to use simpleimputer to plug missing values of different datatypes in one shot.
When i try to do that, the fit_transform seems to be not work as expected.
When dtype argument is not used, the code works just fine, but the resulting dataframe loses its data type information. When i include the dtype list in the arguments, i am seeing the below error. You should be able to simulate the error by just copying and pasting in your local.
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
import sklearn
print(sklearn.__version__)
0.21.dev0
data = [['Alex','NJ',21,5.10],['Mary','NY',20,np.nan],['Sam',np.nan,np.nan,6.3]]
df = pd.DataFrame(data,columns=['Name','State','Age','Height'])
df.dtypes
Name object
State object
Age float64
Height float64
dtype: object
imp = SimpleImputer(strategy="most_frequent")
#df = pd.DataFrame(imp.fit_transform(df),columns=df.columns) <<<<----- This works just fine
#df
#Name State Age Height
#0 Alex NJ 21 5.1
#1 Mary NY 20 5.1
#2 Sam NJ 20 6.3
#df.dtypes
#Name object
#State object
#Age object
#Height object
#dtype: object
The below statement fails - with the error listed below ( I am trying to preserve dtypes during imputing process)
df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-e9780979921f> in <module>()
7
8 imp = SimpleImputer(strategy="most_frequent")
----> 9 df = pd.DataFrame(imp.fit_transform(df),columns=df.columns,dtype=df.dtypes)
~AppDataLocalContinuumanaconda3libsite-packagespandascoreframe.py in __init__(self, data, index, columns, dtype, copy)
337 data = {}
338 if dtype is not None:
--> 339 dtype = self._validate_dtype(dtype)
340
341 if isinstance(data, DataFrame):
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in _validate_dtype(self, dtype)
166
167 if dtype is not None:
--> 168 dtype = pandas_dtype(dtype)
169
170 # a compound dtype
~AppDataLocalContinuumanaconda3libsite-packagespandascoredtypescommon.py in pandas_dtype(dtype)
2020 # which we safeguard against by catching them earlier and returning
2021 # np.dtype(valid_dtype) before this condition is evaluated.
-> 2022 if dtype in [object, np.object_, 'object', 'O']:
2023 return npdtype
2024 elif npdtype.kind == 'O':
~AppDataLocalContinuumanaconda3libsite-packagespandascoregeneric.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
pandas
pandas
edited Nov 12 at 0:21
asked Nov 11 at 23:28
Dsfs Dsfds
82
82
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
If you want to preserve the dtype, I recommend using pandas to find the mode and then call fillna
:
df = df.fillna(df.agg(lambda x: pd.Series.mode(x)[0], axis=0))
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Alternatively, use astype
and pass a dictionary:
df = pd.DataFrame(
imp.fit_transform(df), columns=df.columns
).astype(df.dtypes.to_dict())
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Explicit astype
call is needed because, as per the documentation, only a single dtype
can be passed to the pd.DataFrame
constructor.
?pd.DataFrame
...
dtype : dtype, default None
| Data type to force. Only a single dtype is allowed.
Added version info to the original question. I want to use the imputer function which is much more readable and has the ability to take in dtypes parameter.
– Dsfs Dsfds
Nov 12 at 0:24
@DsfsDsfds See edit?
– coldspeed
Nov 12 at 0:48
the astype worked like charm, thanks!!
– Dsfs Dsfds
Nov 12 at 1:05
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254292%2fpandas-simpleimputer-preserve-datatypes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you want to preserve the dtype, I recommend using pandas to find the mode and then call fillna
:
df = df.fillna(df.agg(lambda x: pd.Series.mode(x)[0], axis=0))
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Alternatively, use astype
and pass a dictionary:
df = pd.DataFrame(
imp.fit_transform(df), columns=df.columns
).astype(df.dtypes.to_dict())
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Explicit astype
call is needed because, as per the documentation, only a single dtype
can be passed to the pd.DataFrame
constructor.
?pd.DataFrame
...
dtype : dtype, default None
| Data type to force. Only a single dtype is allowed.
Added version info to the original question. I want to use the imputer function which is much more readable and has the ability to take in dtypes parameter.
– Dsfs Dsfds
Nov 12 at 0:24
@DsfsDsfds See edit?
– coldspeed
Nov 12 at 0:48
the astype worked like charm, thanks!!
– Dsfs Dsfds
Nov 12 at 1:05
add a comment |
If you want to preserve the dtype, I recommend using pandas to find the mode and then call fillna
:
df = df.fillna(df.agg(lambda x: pd.Series.mode(x)[0], axis=0))
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Alternatively, use astype
and pass a dictionary:
df = pd.DataFrame(
imp.fit_transform(df), columns=df.columns
).astype(df.dtypes.to_dict())
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Explicit astype
call is needed because, as per the documentation, only a single dtype
can be passed to the pd.DataFrame
constructor.
?pd.DataFrame
...
dtype : dtype, default None
| Data type to force. Only a single dtype is allowed.
Added version info to the original question. I want to use the imputer function which is much more readable and has the ability to take in dtypes parameter.
– Dsfs Dsfds
Nov 12 at 0:24
@DsfsDsfds See edit?
– coldspeed
Nov 12 at 0:48
the astype worked like charm, thanks!!
– Dsfs Dsfds
Nov 12 at 1:05
add a comment |
If you want to preserve the dtype, I recommend using pandas to find the mode and then call fillna
:
df = df.fillna(df.agg(lambda x: pd.Series.mode(x)[0], axis=0))
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Alternatively, use astype
and pass a dictionary:
df = pd.DataFrame(
imp.fit_transform(df), columns=df.columns
).astype(df.dtypes.to_dict())
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Explicit astype
call is needed because, as per the documentation, only a single dtype
can be passed to the pd.DataFrame
constructor.
?pd.DataFrame
...
dtype : dtype, default None
| Data type to force. Only a single dtype is allowed.
If you want to preserve the dtype, I recommend using pandas to find the mode and then call fillna
:
df = df.fillna(df.agg(lambda x: pd.Series.mode(x)[0], axis=0))
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Alternatively, use astype
and pass a dictionary:
df = pd.DataFrame(
imp.fit_transform(df), columns=df.columns
).astype(df.dtypes.to_dict())
print(df)
Name State Age Height
0 Alex NJ 21.0 5.1
1 Mary NY 20.0 5.1
2 Sam NJ 20.0 6.3
print(df.dtypes)
Name object
State object
Age float64
Height float64
dtype: object
Explicit astype
call is needed because, as per the documentation, only a single dtype
can be passed to the pd.DataFrame
constructor.
?pd.DataFrame
...
dtype : dtype, default None
| Data type to force. Only a single dtype is allowed.
edited Nov 12 at 0:50
answered Nov 12 at 0:04
coldspeed
119k19113191
119k19113191
Added version info to the original question. I want to use the imputer function which is much more readable and has the ability to take in dtypes parameter.
– Dsfs Dsfds
Nov 12 at 0:24
@DsfsDsfds See edit?
– coldspeed
Nov 12 at 0:48
the astype worked like charm, thanks!!
– Dsfs Dsfds
Nov 12 at 1:05
add a comment |
Added version info to the original question. I want to use the imputer function which is much more readable and has the ability to take in dtypes parameter.
– Dsfs Dsfds
Nov 12 at 0:24
@DsfsDsfds See edit?
– coldspeed
Nov 12 at 0:48
the astype worked like charm, thanks!!
– Dsfs Dsfds
Nov 12 at 1:05
Added version info to the original question. I want to use the imputer function which is much more readable and has the ability to take in dtypes parameter.
– Dsfs Dsfds
Nov 12 at 0:24
Added version info to the original question. I want to use the imputer function which is much more readable and has the ability to take in dtypes parameter.
– Dsfs Dsfds
Nov 12 at 0:24
@DsfsDsfds See edit?
– coldspeed
Nov 12 at 0:48
@DsfsDsfds See edit?
– coldspeed
Nov 12 at 0:48
the astype worked like charm, thanks!!
– Dsfs Dsfds
Nov 12 at 1:05
the astype worked like charm, thanks!!
– Dsfs Dsfds
Nov 12 at 1:05
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254292%2fpandas-simpleimputer-preserve-datatypes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown