Converting Series of str to dates





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I'm having a nightmare converting a series of strings to dates. This is a sample of my data:



net_due_date                from_date   clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000


These are the dtypes:



net_due_date     object
from_date object
clearing_date object
dtype: object


I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.



The following code works fine on net_due_date:



df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')


And a similar code works fine on from_date:



df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')


However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:



OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00


I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps



But then I'm confronted with the following error:



TypeError: unsupported operand type(s) for //: 'str' and 'int'


To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!










share|improve this question

























  • "9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

    – Yilun Zhang
    Nov 23 '18 at 18:44


















0















I'm having a nightmare converting a series of strings to dates. This is a sample of my data:



net_due_date                from_date   clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000


These are the dtypes:



net_due_date     object
from_date object
clearing_date object
dtype: object


I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.



The following code works fine on net_due_date:



df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')


And a similar code works fine on from_date:



df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')


However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:



OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00


I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps



But then I'm confronted with the following error:



TypeError: unsupported operand type(s) for //: 'str' and 'int'


To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!










share|improve this question

























  • "9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

    – Yilun Zhang
    Nov 23 '18 at 18:44














0












0








0








I'm having a nightmare converting a series of strings to dates. This is a sample of my data:



net_due_date                from_date   clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000


These are the dtypes:



net_due_date     object
from_date object
clearing_date object
dtype: object


I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.



The following code works fine on net_due_date:



df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')


And a similar code works fine on from_date:



df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')


However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:



OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00


I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps



But then I'm confronted with the following error:



TypeError: unsupported operand type(s) for //: 'str' and 'int'


To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!










share|improve this question
















I'm having a nightmare converting a series of strings to dates. This is a sample of my data:



net_due_date                from_date   clearing_date
0 2018-10-25 00:00:00.000 2017-06-06 2018-10-13 00:00:00.000
1 2018-09-27 00:00:00.000 2017-06-06 2018-09-30 00:00:00.000
2 2018-05-31 00:00:00.000 2017-06-06 2018-05-18 00:00:00.000
3 2017-12-22 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
4 2018-01-30 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
5 2018-07-31 00:00:00.000 2017-06-06 2018-07-31 00:00:00.000
6 2018-05-29 00:00:00.000 2017-06-06 2018-05-17 00:00:00.000
7 2017-12-14 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
8 2017-11-24 00:00:00.000 2017-06-06 2017-12-08 00:00:00.000
9 2018-09-27 00:00:00.000 2017-06-06 2018-09-13 00:00:00.000
10 2018-01-25 00:00:00.000 2017-06-06 2018-01-16 00:00:00.000
11 2017-11-24 00:00:00.000 2017-06-06 2017-11-30 00:00:00.000
12 2018-10-24 00:00:00.000 2018-01-09 2018-10-11 00:00:00.000
13 2018-01-22 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
14 2018-09-06 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
15 2018-10-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
16 2018-06-15 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
17 2018-04-10 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
18 2018-01-12 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000
19 2018-01-24 00:00:00.000 2018-01-09 2018-10-10 00:00:00.000


These are the dtypes:



net_due_date     object
from_date object
clearing_date object
dtype: object


I'm trying to convert all of this data to the same format so I can run functions on it to calculate variance between dates.



The following code works fine on net_due_date:



df['net_due_date'] = pd.to_datetime(df['net_due_date'], format='%Y-%m-%d')


And a similar code works fine on from_date:



df['from_date'] = pd.to_datetime(df['from_date'], format='%Y-%m-%d')


However, clearing_date has some values in there that are 9999-12-31 and when I run the same code, I get the following error:



OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00


I have literally spent hours trying to fix this, the closest I have got being this link:
Time Series, OOB Timestamps



But then I'm confronted with the following error:



TypeError: unsupported operand type(s) for //: 'str' and 'int'


To get around this, I have tried converting the column to an int first using to_numeric, but get another series of errors. I'm hoping someone has been in this problem before and can help because I cannot find anything online to fix this!







python pandas datetime






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 18:47









Zoe

13.5k85386




13.5k85386










asked Nov 23 '18 at 18:38









roastbeeefroastbeeef

1197




1197













  • "9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

    – Yilun Zhang
    Nov 23 '18 at 18:44



















  • "9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

    – Yilun Zhang
    Nov 23 '18 at 18:44

















"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

– Yilun Zhang
Nov 23 '18 at 18:44





"9999-12-31" is most likely a way to indicate "as of" or "most recent", if this date is something you care about, you can change it to today, and if not, you can change them to Missing values, and the conversion should work.

– Yilun Zhang
Nov 23 '18 at 18:44












2 Answers
2






active

oldest

votes


















1














You can use the errors argument of the same method:




errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’




  • If ‘raise’, then invalid parsing will raise an exception


  • If ‘coerce’, then invalid parsing will be set as NaT


  • If ‘ignore’, then invalid parsing will return the input





This one line should do the trick for you:



df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))

# results:
# net_due_date from_date clearing_date
# ...
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 NaT
# 12 2018-10-24 2018-01-09 2018-10-11
# ...


If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:



df.fillna(pd.datetime.now().date(), inplace=True)

# net_due_date from_date clearing_date
# 10 2018-01-25 2017-06-06 2018-01-16
# 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
# 12 2018-10-24 2018-01-09 2018-10-11


And just to confirm the dtypes:



net_due_date      datetime64[ns]
from_date datetime64[ns]
clearing_date datetime64[ns]
dtype: object





share|improve this answer


























  • giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

    – roastbeeef
    Nov 24 '18 at 12:37











  • lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

    – Idlehands
    Nov 25 '18 at 4:57



















1














I assume that in your case all columns are of String type and contain dates
+ sometime time part. So the simplest choice is:



df = df.applymap(pd.to_datetime)


Run the below script converting a part of your source data:



import pandas as pd

dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
'2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
'2018-01-30 00:00:00.000' ],
'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
'2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
df = pd.DataFrame(data=dd)
df = df.applymap(pd.to_datetime)


When you execute df.info(), you will get (a part of the printout):



Data columns (total 3 columns):
net_due_date 5 non-null datetime64[ns]
from_date 5 non-null datetime64[ns]
clearing_date 5 non-null datetime64[ns]


For demonstration purpose, you may add print(df) both before and after
the conversion.



As far as "very big" years are concerned, Pandas converts dates with
year in the range between 1677 and 2262. So maybe as the first step
you should change such out of range dates to e.g. 2250.






share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451575%2fconverting-series-of-str-to-dates%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You can use the errors argument of the same method:




    errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’




    • If ‘raise’, then invalid parsing will raise an exception


    • If ‘coerce’, then invalid parsing will be set as NaT


    • If ‘ignore’, then invalid parsing will return the input





    This one line should do the trick for you:



    df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))

    # results:
    # net_due_date from_date clearing_date
    # ...
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 NaT
    # 12 2018-10-24 2018-01-09 2018-10-11
    # ...


    If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:



    df.fillna(pd.datetime.now().date(), inplace=True)

    # net_due_date from_date clearing_date
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
    # 12 2018-10-24 2018-01-09 2018-10-11


    And just to confirm the dtypes:



    net_due_date      datetime64[ns]
    from_date datetime64[ns]
    clearing_date datetime64[ns]
    dtype: object





    share|improve this answer


























    • giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

      – roastbeeef
      Nov 24 '18 at 12:37











    • lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

      – Idlehands
      Nov 25 '18 at 4:57
















    1














    You can use the errors argument of the same method:




    errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’




    • If ‘raise’, then invalid parsing will raise an exception


    • If ‘coerce’, then invalid parsing will be set as NaT


    • If ‘ignore’, then invalid parsing will return the input





    This one line should do the trick for you:



    df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))

    # results:
    # net_due_date from_date clearing_date
    # ...
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 NaT
    # 12 2018-10-24 2018-01-09 2018-10-11
    # ...


    If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:



    df.fillna(pd.datetime.now().date(), inplace=True)

    # net_due_date from_date clearing_date
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
    # 12 2018-10-24 2018-01-09 2018-10-11


    And just to confirm the dtypes:



    net_due_date      datetime64[ns]
    from_date datetime64[ns]
    clearing_date datetime64[ns]
    dtype: object





    share|improve this answer


























    • giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

      – roastbeeef
      Nov 24 '18 at 12:37











    • lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

      – Idlehands
      Nov 25 '18 at 4:57














    1












    1








    1







    You can use the errors argument of the same method:




    errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’




    • If ‘raise’, then invalid parsing will raise an exception


    • If ‘coerce’, then invalid parsing will be set as NaT


    • If ‘ignore’, then invalid parsing will return the input





    This one line should do the trick for you:



    df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))

    # results:
    # net_due_date from_date clearing_date
    # ...
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 NaT
    # 12 2018-10-24 2018-01-09 2018-10-11
    # ...


    If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:



    df.fillna(pd.datetime.now().date(), inplace=True)

    # net_due_date from_date clearing_date
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
    # 12 2018-10-24 2018-01-09 2018-10-11


    And just to confirm the dtypes:



    net_due_date      datetime64[ns]
    from_date datetime64[ns]
    clearing_date datetime64[ns]
    dtype: object





    share|improve this answer















    You can use the errors argument of the same method:




    errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’




    • If ‘raise’, then invalid parsing will raise an exception


    • If ‘coerce’, then invalid parsing will be set as NaT


    • If ‘ignore’, then invalid parsing will return the input





    This one line should do the trick for you:



    df = df.apply(lambda x: pd.to_datetime(x, errors='coerce'))

    # results:
    # net_due_date from_date clearing_date
    # ...
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 NaT
    # 12 2018-10-24 2018-01-09 2018-10-11
    # ...


    If necessary you can massage the DataFrame afterwards for the NaT values and transform to something meaningful to you, e.g.:



    df.fillna(pd.datetime.now().date(), inplace=True)

    # net_due_date from_date clearing_date
    # 10 2018-01-25 2017-06-06 2018-01-16
    # 11 2017-11-24 2017-06-06 2018-11-23 # <-- changed to today
    # 12 2018-10-24 2018-01-09 2018-10-11


    And just to confirm the dtypes:



    net_due_date      datetime64[ns]
    from_date datetime64[ns]
    clearing_date datetime64[ns]
    dtype: object






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 23 '18 at 18:57

























    answered Nov 23 '18 at 18:49









    IdlehandsIdlehands

    6,1631923




    6,1631923













    • giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

      – roastbeeef
      Nov 24 '18 at 12:37











    • lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

      – Idlehands
      Nov 25 '18 at 4:57



















    • giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

      – roastbeeef
      Nov 24 '18 at 12:37











    • lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

      – Idlehands
      Nov 25 '18 at 4:57

















    giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

    – roastbeeef
    Nov 24 '18 at 12:37





    giving this a go now. is the lambda function the most efficient way to run this function? one thing i didnt mention is that i'm working with 55m rows

    – roastbeeef
    Nov 24 '18 at 12:37













    lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

    – Idlehands
    Nov 25 '18 at 4:57





    lambda is just an anonymous function, basically wrapping the pd.to_datetime to convert each columns being passed. As far as efficiency goes I think this is probably decent as the functions being applied are rather minimal. Give it a test and see the run time on your machine though.

    – Idlehands
    Nov 25 '18 at 4:57













    1














    I assume that in your case all columns are of String type and contain dates
    + sometime time part. So the simplest choice is:



    df = df.applymap(pd.to_datetime)


    Run the below script converting a part of your source data:



    import pandas as pd

    dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
    '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
    '2018-01-30 00:00:00.000' ],
    'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
    'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
    '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
    df = pd.DataFrame(data=dd)
    df = df.applymap(pd.to_datetime)


    When you execute df.info(), you will get (a part of the printout):



    Data columns (total 3 columns):
    net_due_date 5 non-null datetime64[ns]
    from_date 5 non-null datetime64[ns]
    clearing_date 5 non-null datetime64[ns]


    For demonstration purpose, you may add print(df) both before and after
    the conversion.



    As far as "very big" years are concerned, Pandas converts dates with
    year in the range between 1677 and 2262. So maybe as the first step
    you should change such out of range dates to e.g. 2250.






    share|improve this answer






























      1














      I assume that in your case all columns are of String type and contain dates
      + sometime time part. So the simplest choice is:



      df = df.applymap(pd.to_datetime)


      Run the below script converting a part of your source data:



      import pandas as pd

      dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
      '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
      '2018-01-30 00:00:00.000' ],
      'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
      'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
      '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
      df = pd.DataFrame(data=dd)
      df = df.applymap(pd.to_datetime)


      When you execute df.info(), you will get (a part of the printout):



      Data columns (total 3 columns):
      net_due_date 5 non-null datetime64[ns]
      from_date 5 non-null datetime64[ns]
      clearing_date 5 non-null datetime64[ns]


      For demonstration purpose, you may add print(df) both before and after
      the conversion.



      As far as "very big" years are concerned, Pandas converts dates with
      year in the range between 1677 and 2262. So maybe as the first step
      you should change such out of range dates to e.g. 2250.






      share|improve this answer




























        1












        1








        1







        I assume that in your case all columns are of String type and contain dates
        + sometime time part. So the simplest choice is:



        df = df.applymap(pd.to_datetime)


        Run the below script converting a part of your source data:



        import pandas as pd

        dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
        '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
        '2018-01-30 00:00:00.000' ],
        'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
        'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
        '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
        df = pd.DataFrame(data=dd)
        df = df.applymap(pd.to_datetime)


        When you execute df.info(), you will get (a part of the printout):



        Data columns (total 3 columns):
        net_due_date 5 non-null datetime64[ns]
        from_date 5 non-null datetime64[ns]
        clearing_date 5 non-null datetime64[ns]


        For demonstration purpose, you may add print(df) both before and after
        the conversion.



        As far as "very big" years are concerned, Pandas converts dates with
        year in the range between 1677 and 2262. So maybe as the first step
        you should change such out of range dates to e.g. 2250.






        share|improve this answer















        I assume that in your case all columns are of String type and contain dates
        + sometime time part. So the simplest choice is:



        df = df.applymap(pd.to_datetime)


        Run the below script converting a part of your source data:



        import pandas as pd

        dd = { 'net_due_date': [ '2018-10-25 00:00:00.000', '2018-09-27 00:00:00.000',
        '2018-05-31 00:00:00.000', '2017-12-22 00:00:00.000',
        '2018-01-30 00:00:00.000' ],
        'from_date': [ '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06', '2017-06-06' ],
        'clearing_date': [ '2018-10-13 00:00:00.000', '2018-09-30 00:00:00.000',
        '2018-05-18 00:00:00.000', '2017-12-08 00:00:00.000', '2018-01-16 00:00:00.000' ] }
        df = pd.DataFrame(data=dd)
        df = df.applymap(pd.to_datetime)


        When you execute df.info(), you will get (a part of the printout):



        Data columns (total 3 columns):
        net_due_date 5 non-null datetime64[ns]
        from_date 5 non-null datetime64[ns]
        clearing_date 5 non-null datetime64[ns]


        For demonstration purpose, you may add print(df) both before and after
        the conversion.



        As far as "very big" years are concerned, Pandas converts dates with
        year in the range between 1677 and 2262. So maybe as the first step
        you should change such out of range dates to e.g. 2250.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 23 '18 at 19:38

























        answered Nov 23 '18 at 19:24









        Valdi_BoValdi_Bo

        5,6152916




        5,6152916






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451575%2fconverting-series-of-str-to-dates%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Academy of Television Arts & Sciences

            L'Équipe

            1995 France bombings