Converting Series of str to dates

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I'm having a nightmare converting a series of strings to dates. This is a sample of my data:

net_due_date                from_date   clearing_date

0   2018-10-25 00:00:00.000 2017-06-06  2018-10-13 00:00:00.000

1   2018-09-27 00:00:00.000 2017-06-06  2018-09-30 00:00:00.000

2   2018-05-31 00:00:00.000 2017-06-06  2018-05-18 00:00:00.000

3   2017-12-22 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

4   2018-01-30 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

5   2018-07-31 00:00:00.000 2017-06-06  2018-07-31 00:00:00.000

6   2018-05-29 00:00:00.000 2017-06-06  2018-05-17 00:00:00.000

7   2017-12-14 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

8   2017-11-24 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

9   2018-09-27 00:00:00.000 2017-06-06  2018-09-13 00:00:00.000

10  2018-01-25 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

11  2017-11-24 00:00:00.000 2017-06-06  2017-11-30 00:00:00.000

12  2018-10-24 00:00:00.000 2018-01-09  2018-10-11 00:00:00.000

13  2018-01-22 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

14  2018-09-06 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

15  2018-10-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

16  2018-06-15 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

17  2018-04-10 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

18  2018-01-12 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

19  2018-01-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

These are the dtypes:

net_due_date     object

from_date        object

clearing_date    object

dtype: object

I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.

The following code works fine on net_due_date:

df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')

And a similar code works fine on from_date:

df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')

However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00

I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps

But then I'm confronted with the following error:

TypeError: unsupported operand type(s) for //: 'str' and 'int'

To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

asked Nov 23 '18 at 18:38

roastbeeef

1197

"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

– Yilun Zhang
Nov 23 '18 at 18:44

add a comment |

I'm having a nightmare converting a series of strings to dates. This is a sample of my data:

net_due_date                from_date   clearing_date

0   2018-10-25 00:00:00.000 2017-06-06  2018-10-13 00:00:00.000

1   2018-09-27 00:00:00.000 2017-06-06  2018-09-30 00:00:00.000

2   2018-05-31 00:00:00.000 2017-06-06  2018-05-18 00:00:00.000

3   2017-12-22 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

4   2018-01-30 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

5   2018-07-31 00:00:00.000 2017-06-06  2018-07-31 00:00:00.000

6   2018-05-29 00:00:00.000 2017-06-06  2018-05-17 00:00:00.000

7   2017-12-14 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

8   2017-11-24 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

9   2018-09-27 00:00:00.000 2017-06-06  2018-09-13 00:00:00.000

10  2018-01-25 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

11  2017-11-24 00:00:00.000 2017-06-06  2017-11-30 00:00:00.000

12  2018-10-24 00:00:00.000 2018-01-09  2018-10-11 00:00:00.000

13  2018-01-22 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

14  2018-09-06 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

15  2018-10-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

16  2018-06-15 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

17  2018-04-10 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

18  2018-01-12 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

19  2018-01-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

These are the dtypes:

net_due_date     object

from_date        object

clearing_date    object

dtype: object

I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.

The following code works fine on net_due_date:

df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')

And a similar code works fine on from_date:

df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')

However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00

I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps

But then I'm confronted with the following error:

TypeError: unsupported operand type(s) for //: 'str' and 'int'

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

asked Nov 23 '18 at 18:38

roastbeeef

1197

"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

– Yilun Zhang
Nov 23 '18 at 18:44

add a comment |

I'm having a nightmare converting a series of strings to dates. This is a sample of my data:

net_due_date                from_date   clearing_date

0   2018-10-25 00:00:00.000 2017-06-06  2018-10-13 00:00:00.000

1   2018-09-27 00:00:00.000 2017-06-06  2018-09-30 00:00:00.000

2   2018-05-31 00:00:00.000 2017-06-06  2018-05-18 00:00:00.000

3   2017-12-22 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

4   2018-01-30 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

5   2018-07-31 00:00:00.000 2017-06-06  2018-07-31 00:00:00.000

6   2018-05-29 00:00:00.000 2017-06-06  2018-05-17 00:00:00.000

7   2017-12-14 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

8   2017-11-24 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

9   2018-09-27 00:00:00.000 2017-06-06  2018-09-13 00:00:00.000

10  2018-01-25 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

11  2017-11-24 00:00:00.000 2017-06-06  2017-11-30 00:00:00.000

12  2018-10-24 00:00:00.000 2018-01-09  2018-10-11 00:00:00.000

13  2018-01-22 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

14  2018-09-06 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

15  2018-10-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

16  2018-06-15 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

17  2018-04-10 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

18  2018-01-12 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

19  2018-01-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

These are the dtypes:

net_due_date     object

from_date        object

clearing_date    object

dtype: object

I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.

The following code works fine on net_due_date:

df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')

And a similar code works fine on from_date:

df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')

However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00

I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps

But then I'm confronted with the following error:

TypeError: unsupported operand type(s) for //: 'str' and 'int'

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

asked Nov 23 '18 at 18:38

roastbeeef

1197

I'm having a nightmare converting a series of strings to dates. This is a sample of my data:

net_due_date                from_date   clearing_date

0   2018-10-25 00:00:00.000 2017-06-06  2018-10-13 00:00:00.000

1   2018-09-27 00:00:00.000 2017-06-06  2018-09-30 00:00:00.000

2   2018-05-31 00:00:00.000 2017-06-06  2018-05-18 00:00:00.000

3   2017-12-22 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

4   2018-01-30 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

5   2018-07-31 00:00:00.000 2017-06-06  2018-07-31 00:00:00.000

6   2018-05-29 00:00:00.000 2017-06-06  2018-05-17 00:00:00.000

7   2017-12-14 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

8   2017-11-24 00:00:00.000 2017-06-06  2017-12-08 00:00:00.000

9   2018-09-27 00:00:00.000 2017-06-06  2018-09-13 00:00:00.000

10  2018-01-25 00:00:00.000 2017-06-06  2018-01-16 00:00:00.000

11  2017-11-24 00:00:00.000 2017-06-06  2017-11-30 00:00:00.000

12  2018-10-24 00:00:00.000 2018-01-09  2018-10-11 00:00:00.000

13  2018-01-22 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

14  2018-09-06 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

15  2018-10-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

16  2018-06-15 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

17  2018-04-10 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

18  2018-01-12 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

19  2018-01-24 00:00:00.000 2018-01-09  2018-10-10 00:00:00.000

These are the dtypes:

net_due_date     object

from_date        object

clearing_date    object

dtype: object

I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.

The following code works fine on net_due_date:

df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')

And a similar code works fine on from_date:

df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')

However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00

I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps

But then I'm confronted with the following error:

TypeError: unsupported operand type(s) for //: 'str' and 'int'

python pandas datetime

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

asked Nov 23 '18 at 18:38

roastbeeef

1197

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

asked Nov 23 '18 at 18:38

roastbeeef

1197

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

edited Nov 23 '18 at 18:47

Zoe

13.5k85386

asked Nov 23 '18 at 18:38

roastbeeef

1197

asked Nov 23 '18 at 18:38

roastbeeef

1197

asked Nov 23 '18 at 18:38

roastbeeef

1197

"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

– Yilun Zhang
Nov 23 '18 at 18:44

add a comment |

"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

– Yilun Zhang
Nov 23 '18 at 18:44

"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

– Yilun Zhang
Nov 23 '18 at 18:44

add a comment |

2 Answers
2

active

oldest

votes

You can use the errors argument of the same method:

errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

If ‘raise’, then invalid parsing will raise an exception

If ‘coerce’, then invalid parsing will be set as NaT

If ‘ignore’, then invalid parsing will return the input

This one line should do the trick for you:

df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))



# results:

#    net_due_date  from_date  clearing_date

# ...

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06            NaT

# 12   2018-10-24 2018-01-09     2018-10-11

# ...

If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:

df.fillna(pd.datetime.now().date(), inplace=True)



#    net_due_date  from_date  clearing_date    

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06     2018-11-23 # <-- changed to today

# 12   2018-10-24 2018-01-09     2018-10-11

And just to confirm the dtypes:

net_due_date      datetime64[ns]

 from_date        datetime64[ns]

 clearing_date    datetime64[ns]

dtype: object

edited Nov 23 '18 at 18:57

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

– roastbeeef
Nov 24 '18 at 12:37

lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

– Idlehands
Nov 25 '18 at 4:57

add a comment |

I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:

df = df.applymap(pd.to_datetime)

Run the below script converting a part of your source data:

import pandas as pd



dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',

        '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',

        '2018-01-30 00:00:00.000' ],

    'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],

    'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',

        '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }

df = pd.DataFrame(data=dd)

df = df.applymap(pd.to_datetime)

When you execute df.info(), you will get (a part of the printout):

Data columns (total 3 columns):

net_due_date     5 non-null datetime64[ns]

from_date        5 non-null datetime64[ns]

clearing_date    5 non-null datetime64[ns]

For demonstration purpose, you may add print(df) both before and after
the conversion.

As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.

edited Nov 23 '18 at 19:38

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451575%2fconverting-series-of-str-to-dates%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can use the errors argument of the same method:

errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

If ‘raise’, then invalid parsing will raise an exception

If ‘coerce’, then invalid parsing will be set as NaT

If ‘ignore’, then invalid parsing will return the input

This one line should do the trick for you:

df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))



# results:

#    net_due_date  from_date  clearing_date

# ...

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06            NaT

# 12   2018-10-24 2018-01-09     2018-10-11

# ...

If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:

df.fillna(pd.datetime.now().date(), inplace=True)



#    net_due_date  from_date  clearing_date    

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06     2018-11-23 # <-- changed to today

# 12   2018-10-24 2018-01-09     2018-10-11

And just to confirm the dtypes:

net_due_date      datetime64[ns]

 from_date        datetime64[ns]

 clearing_date    datetime64[ns]

dtype: object

edited Nov 23 '18 at 18:57

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

– roastbeeef
Nov 24 '18 at 12:37

lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

– Idlehands
Nov 25 '18 at 4:57

add a comment |

You can use the errors argument of the same method:

errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

If ‘raise’, then invalid parsing will raise an exception

If ‘coerce’, then invalid parsing will be set as NaT

If ‘ignore’, then invalid parsing will return the input

This one line should do the trick for you:

df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))



# results:

#    net_due_date  from_date  clearing_date

# ...

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06            NaT

# 12   2018-10-24 2018-01-09     2018-10-11

# ...

If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:

df.fillna(pd.datetime.now().date(), inplace=True)



#    net_due_date  from_date  clearing_date    

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06     2018-11-23 # <-- changed to today

# 12   2018-10-24 2018-01-09     2018-10-11

And just to confirm the dtypes:

net_due_date      datetime64[ns]

 from_date        datetime64[ns]

 clearing_date    datetime64[ns]

dtype: object

edited Nov 23 '18 at 18:57

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

– roastbeeef
Nov 24 '18 at 12:37

lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

– Idlehands
Nov 25 '18 at 4:57

add a comment |

You can use the errors argument of the same method:

errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

If ‘raise’, then invalid parsing will raise an exception

If ‘coerce’, then invalid parsing will be set as NaT

If ‘ignore’, then invalid parsing will return the input

This one line should do the trick for you:

df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))



# results:

#    net_due_date  from_date  clearing_date

# ...

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06            NaT

# 12   2018-10-24 2018-01-09     2018-10-11

# ...

If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:

df.fillna(pd.datetime.now().date(), inplace=True)



#    net_due_date  from_date  clearing_date    

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06     2018-11-23 # <-- changed to today

# 12   2018-10-24 2018-01-09     2018-10-11

And just to confirm the dtypes:

net_due_date      datetime64[ns]

 from_date        datetime64[ns]

 clearing_date    datetime64[ns]

dtype: object

edited Nov 23 '18 at 18:57

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

You can use the errors argument of the same method:

errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

If ‘raise’, then invalid parsing will raise an exception

If ‘coerce’, then invalid parsing will be set as NaT

If ‘ignore’, then invalid parsing will return the input

This one line should do the trick for you:

df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))



# results:

#    net_due_date  from_date  clearing_date

# ...

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06            NaT

# 12   2018-10-24 2018-01-09     2018-10-11

# ...

If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:

df.fillna(pd.datetime.now().date(), inplace=True)



#    net_due_date  from_date  clearing_date    

# 10   2018-01-25 2017-06-06     2018-01-16

# 11   2017-11-24 2017-06-06     2018-11-23 # <-- changed to today

# 12   2018-10-24 2018-01-09     2018-10-11

And just to confirm the dtypes:

net_due_date      datetime64[ns]

 from_date        datetime64[ns]

 clearing_date    datetime64[ns]

dtype: object

edited Nov 23 '18 at 18:57

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

edited Nov 23 '18 at 18:57

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

answered Nov 23 '18 at 18:49

Idlehands

6,1631923

giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

– roastbeeef
Nov 24 '18 at 12:37

lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

– Idlehands
Nov 25 '18 at 4:57

add a comment |

giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

– roastbeeef
Nov 24 '18 at 12:37

lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

– Idlehands
Nov 25 '18 at 4:57

giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

– roastbeeef
Nov 24 '18 at 12:37

lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

– Idlehands
Nov 25 '18 at 4:57

add a comment |

I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:

df = df.applymap(pd.to_datetime)

Run the below script converting a part of your source data:

import pandas as pd



dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',

        '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',

        '2018-01-30 00:00:00.000' ],

    'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],

    'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',

        '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }

df = pd.DataFrame(data=dd)

df = df.applymap(pd.to_datetime)

When you execute df.info(), you will get (a part of the printout):

Data columns (total 3 columns):

net_due_date     5 non-null datetime64[ns]

from_date        5 non-null datetime64[ns]

clearing_date    5 non-null datetime64[ns]

For demonstration purpose, you may add print(df) both before and after
the conversion.

As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.

edited Nov 23 '18 at 19:38

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

add a comment |

I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:

df = df.applymap(pd.to_datetime)

Run the below script converting a part of your source data:

import pandas as pd



dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',

        '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',

        '2018-01-30 00:00:00.000' ],

    'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],

    'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',

        '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }

df = pd.DataFrame(data=dd)

df = df.applymap(pd.to_datetime)

When you execute df.info(), you will get (a part of the printout):

Data columns (total 3 columns):

net_due_date     5 non-null datetime64[ns]

from_date        5 non-null datetime64[ns]

clearing_date    5 non-null datetime64[ns]

For demonstration purpose, you may add print(df) both before and after
the conversion.

As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.

edited Nov 23 '18 at 19:38

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

add a comment |

I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:

df = df.applymap(pd.to_datetime)

Run the below script converting a part of your source data:

import pandas as pd



dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',

        '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',

        '2018-01-30 00:00:00.000' ],

    'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],

    'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',

        '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }

df = pd.DataFrame(data=dd)

df = df.applymap(pd.to_datetime)

When you execute df.info(), you will get (a part of the printout):

Data columns (total 3 columns):

net_due_date     5 non-null datetime64[ns]

from_date        5 non-null datetime64[ns]

clearing_date    5 non-null datetime64[ns]

For demonstration purpose, you may add print(df) both before and after
the conversion.

As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.

edited Nov 23 '18 at 19:38

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:

df = df.applymap(pd.to_datetime)

Run the below script converting a part of your source data:

import pandas as pd



dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',

        '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',

        '2018-01-30 00:00:00.000' ],

    'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],

    'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',

        '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }

df = pd.DataFrame(data=dd)

df = df.applymap(pd.to_datetime)

When you execute df.info(), you will get (a part of the printout):

Data columns (total 3 columns):

net_due_date     5 non-null datetime64[ns]

from_date        5 non-null datetime64[ns]

clearing_date    5 non-null datetime64[ns]

For demonstration purpose, you may add print(df) both before and after
the conversion.

As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.

edited Nov 23 '18 at 19:38

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

edited Nov 23 '18 at 19:38

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

answered Nov 23 '18 at 19:24

Valdi_Bo

5,6152916

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk