Converting Series of str to dates
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I'm having a nightmare converting a series of strings to dates. This is a sample of my data:
net_due_date from_date clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
These are the dtypes:
net_due_date object
from_date object
clearing_date object
dtype: object
I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.
The following code works fine on net_due_date:
df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')
And a similar code works fine on from_date:
df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')
However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps
But then I'm confronted with the following error:
TypeError: unsupported operand type(s) for //: 'str' and 'int'
To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!
python pandas datetime
add a comment |
I'm having a nightmare converting a series of strings to dates. This is a sample of my data:
net_due_date from_date clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
These are the dtypes:
net_due_date object
from_date object
clearing_date object
dtype: object
I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.
The following code works fine on net_due_date:
df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')
And a similar code works fine on from_date:
df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')
However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps
But then I'm confronted with the following error:
TypeError: unsupported operand type(s) for //: 'str' and 'int'
To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!
python pandas datetime
"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.
– Yilun Zhang
Nov 23 '18 at 18:44
add a comment |
I'm having a nightmare converting a series of strings to dates. This is a sample of my data:
net_due_date from_date clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
These are the dtypes:
net_due_date object
from_date object
clearing_date object
dtype: object
I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.
The following code works fine on net_due_date:
df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')
And a similar code works fine on from_date:
df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')
However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps
But then I'm confronted with the following error:
TypeError: unsupported operand type(s) for //: 'str' and 'int'
To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!
python pandas datetime
I'm having a nightmare converting a series of strings to dates. This is a sample of my data:
net_due_date from_date clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
These are the dtypes:
net_due_date object
from_date object
clearing_date object
dtype: object
I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.
The following code works fine on net_due_date:
df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')
And a similar code works fine on from_date:
df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')
However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps
But then I'm confronted with the following error:
TypeError: unsupported operand type(s) for //: 'str' and 'int'
To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!
python pandas datetime
python pandas datetime
edited Nov 23 '18 at 18:47
Zoe
13.5k85386
13.5k85386
asked Nov 23 '18 at 18:38
roastbeeefroastbeeef
1197
1197
"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.
– Yilun Zhang
Nov 23 '18 at 18:44
add a comment |
"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.
– Yilun Zhang
Nov 23 '18 at 18:44
"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.
– Yilun Zhang
Nov 23 '18 at 18:44
"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.
– Yilun Zhang
Nov 23 '18 at 18:44
add a comment |
2 Answers
2
active
oldest
votes
You can use the errors argument of the same method:
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If ‘raise’, then invalid parsing will raise an exception
If ‘coerce’, then invalid parsing will be set as NaT
If ‘ignore’, then invalid parsing will return the input
This one line should do the trick for you:
df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))
# results:
# net_due_date from_date clearing_date
# ...
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 NaT
# 12 2018-10-24 2018-01-09 2018-10-11
# ...
If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:
df.fillna(pd.datetime.now().date(), inplace=True)
# net_due_date from_date clearing_date
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
# 12 2018-10-24 2018-01-09 2018-10-11
And just to confirm the dtypes:
net_due_date datetime64[ns]
from_date datetime64[ns]
clearing_date datetime64[ns]
dtype: object
giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows
– roastbeeef
Nov 24 '18 at 12:37
lambdais just an anonymous function, basically wrapping thepd.to_datetimeto convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.
– Idlehands
Nov 25 '18 at 4:57
add a comment |
I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:
df = df.applymap(pd.to_datetime)
Run the below script converting a part of your source data:
import pandas as pd
dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
'2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
'2018-01-30 00:00:00.000' ],
'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
'2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
df = pd.DataFrame(data=dd)
df = df.applymap(pd.to_datetime)
When you execute df.info(), you will get (a part of the printout):
Data columns (total 3 columns):
net_due_date 5 non-null datetime64[ns]
from_date 5 non-null datetime64[ns]
clearing_date 5 non-null datetime64[ns]
For demonstration purpose, you may add print(df) both before and after
the conversion.
As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451575%2fconverting-series-of-str-to-dates%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use the errors argument of the same method:
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If ‘raise’, then invalid parsing will raise an exception
If ‘coerce’, then invalid parsing will be set as NaT
If ‘ignore’, then invalid parsing will return the input
This one line should do the trick for you:
df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))
# results:
# net_due_date from_date clearing_date
# ...
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 NaT
# 12 2018-10-24 2018-01-09 2018-10-11
# ...
If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:
df.fillna(pd.datetime.now().date(), inplace=True)
# net_due_date from_date clearing_date
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
# 12 2018-10-24 2018-01-09 2018-10-11
And just to confirm the dtypes:
net_due_date datetime64[ns]
from_date datetime64[ns]
clearing_date datetime64[ns]
dtype: object
giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows
– roastbeeef
Nov 24 '18 at 12:37
lambdais just an anonymous function, basically wrapping thepd.to_datetimeto convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.
– Idlehands
Nov 25 '18 at 4:57
add a comment |
You can use the errors argument of the same method:
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If ‘raise’, then invalid parsing will raise an exception
If ‘coerce’, then invalid parsing will be set as NaT
If ‘ignore’, then invalid parsing will return the input
This one line should do the trick for you:
df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))
# results:
# net_due_date from_date clearing_date
# ...
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 NaT
# 12 2018-10-24 2018-01-09 2018-10-11
# ...
If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:
df.fillna(pd.datetime.now().date(), inplace=True)
# net_due_date from_date clearing_date
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
# 12 2018-10-24 2018-01-09 2018-10-11
And just to confirm the dtypes:
net_due_date datetime64[ns]
from_date datetime64[ns]
clearing_date datetime64[ns]
dtype: object
giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows
– roastbeeef
Nov 24 '18 at 12:37
lambdais just an anonymous function, basically wrapping thepd.to_datetimeto convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.
– Idlehands
Nov 25 '18 at 4:57
add a comment |
You can use the errors argument of the same method:
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If ‘raise’, then invalid parsing will raise an exception
If ‘coerce’, then invalid parsing will be set as NaT
If ‘ignore’, then invalid parsing will return the input
This one line should do the trick for you:
df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))
# results:
# net_due_date from_date clearing_date
# ...
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 NaT
# 12 2018-10-24 2018-01-09 2018-10-11
# ...
If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:
df.fillna(pd.datetime.now().date(), inplace=True)
# net_due_date from_date clearing_date
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
# 12 2018-10-24 2018-01-09 2018-10-11
And just to confirm the dtypes:
net_due_date datetime64[ns]
from_date datetime64[ns]
clearing_date datetime64[ns]
dtype: object
You can use the errors argument of the same method:
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If ‘raise’, then invalid parsing will raise an exception
If ‘coerce’, then invalid parsing will be set as NaT
If ‘ignore’, then invalid parsing will return the input
This one line should do the trick for you:
df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))
# results:
# net_due_date from_date clearing_date
# ...
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 NaT
# 12 2018-10-24 2018-01-09 2018-10-11
# ...
If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:
df.fillna(pd.datetime.now().date(), inplace=True)
# net_due_date from_date clearing_date
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
# 12 2018-10-24 2018-01-09 2018-10-11
And just to confirm the dtypes:
net_due_date datetime64[ns]
from_date datetime64[ns]
clearing_date datetime64[ns]
dtype: object
edited Nov 23 '18 at 18:57
answered Nov 23 '18 at 18:49
IdlehandsIdlehands
6,1631923
6,1631923
giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows
– roastbeeef
Nov 24 '18 at 12:37
lambdais just an anonymous function, basically wrapping thepd.to_datetimeto convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.
– Idlehands
Nov 25 '18 at 4:57
add a comment |
giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows
– roastbeeef
Nov 24 '18 at 12:37
lambdais just an anonymous function, basically wrapping thepd.to_datetimeto convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.
– Idlehands
Nov 25 '18 at 4:57
giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows
– roastbeeef
Nov 24 '18 at 12:37
giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows
– roastbeeef
Nov 24 '18 at 12:37
lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.– Idlehands
Nov 25 '18 at 4:57
lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.– Idlehands
Nov 25 '18 at 4:57
add a comment |
I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:
df = df.applymap(pd.to_datetime)
Run the below script converting a part of your source data:
import pandas as pd
dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
'2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
'2018-01-30 00:00:00.000' ],
'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
'2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
df = pd.DataFrame(data=dd)
df = df.applymap(pd.to_datetime)
When you execute df.info(), you will get (a part of the printout):
Data columns (total 3 columns):
net_due_date 5 non-null datetime64[ns]
from_date 5 non-null datetime64[ns]
clearing_date 5 non-null datetime64[ns]
For demonstration purpose, you may add print(df) both before and after
the conversion.
As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.
add a comment |
I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:
df = df.applymap(pd.to_datetime)
Run the below script converting a part of your source data:
import pandas as pd
dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
'2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
'2018-01-30 00:00:00.000' ],
'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
'2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
df = pd.DataFrame(data=dd)
df = df.applymap(pd.to_datetime)
When you execute df.info(), you will get (a part of the printout):
Data columns (total 3 columns):
net_due_date 5 non-null datetime64[ns]
from_date 5 non-null datetime64[ns]
clearing_date 5 non-null datetime64[ns]
For demonstration purpose, you may add print(df) both before and after
the conversion.
As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.
add a comment |
I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:
df = df.applymap(pd.to_datetime)
Run the below script converting a part of your source data:
import pandas as pd
dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
'2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
'2018-01-30 00:00:00.000' ],
'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
'2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
df = pd.DataFrame(data=dd)
df = df.applymap(pd.to_datetime)
When you execute df.info(), you will get (a part of the printout):
Data columns (total 3 columns):
net_due_date 5 non-null datetime64[ns]
from_date 5 non-null datetime64[ns]
clearing_date 5 non-null datetime64[ns]
For demonstration purpose, you may add print(df) both before and after
the conversion.
As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.
I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:
df = df.applymap(pd.to_datetime)
Run the below script converting a part of your source data:
import pandas as pd
dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
'2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
'2018-01-30 00:00:00.000' ],
'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
'2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
df = pd.DataFrame(data=dd)
df = df.applymap(pd.to_datetime)
When you execute df.info(), you will get (a part of the printout):
Data columns (total 3 columns):
net_due_date 5 non-null datetime64[ns]
from_date 5 non-null datetime64[ns]
clearing_date 5 non-null datetime64[ns]
For demonstration purpose, you may add print(df) both before and after
the conversion.
As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.
edited Nov 23 '18 at 19:38
answered Nov 23 '18 at 19:24
Valdi_BoValdi_Bo
5,6152916
5,6152916
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451575%2fconverting-series-of-str-to-dates%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.
– Yilun Zhang
Nov 23 '18 at 18:44