sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float32')











up vote
0
down vote

favorite












I am not familiar with python and am trying to run a decision tree classifier in python using SKLEARN library and when I run the code, I encounters the error:




ValueError: Input contains NaN, infinity or a value too large for dtype('float32')




I have tried using a smaller subset of my excel datasheet and the code is able to execute with the results I want. So I suspect the problem is that my data set is too big. Here is my code that causes the crash:



df_X = data_train[['DayOfWeek', 'Promo', 'StateHoliday']]
df_Y = data_train[['Sales_band']]

X_train, X_test, y_train, y_test = train_test_split(df_X, df_Y, random_state=1)
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train) // Line that causes crash
y_predict = model.predict(X_test)

print('The accuracy of the Decision Tree is', accuracy_score(y_test, y_predict))









share|improve this question




















  • 2




    The error message seems to suggest that your dataset is not too big; rather that one of the values of your dataset is either: Not a number, infity or a number too large to fit into a floating point number of type float32. I would suggest checking your data for missing values/nan's as a first step.
    – Pallie
    Nov 7 at 10:44










  • Oh, you are right. Thank you
    – Jia Hao Lim
    Nov 7 at 10:48















up vote
0
down vote

favorite












I am not familiar with python and am trying to run a decision tree classifier in python using SKLEARN library and when I run the code, I encounters the error:




ValueError: Input contains NaN, infinity or a value too large for dtype('float32')




I have tried using a smaller subset of my excel datasheet and the code is able to execute with the results I want. So I suspect the problem is that my data set is too big. Here is my code that causes the crash:



df_X = data_train[['DayOfWeek', 'Promo', 'StateHoliday']]
df_Y = data_train[['Sales_band']]

X_train, X_test, y_train, y_test = train_test_split(df_X, df_Y, random_state=1)
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train) // Line that causes crash
y_predict = model.predict(X_test)

print('The accuracy of the Decision Tree is', accuracy_score(y_test, y_predict))









share|improve this question




















  • 2




    The error message seems to suggest that your dataset is not too big; rather that one of the values of your dataset is either: Not a number, infity or a number too large to fit into a floating point number of type float32. I would suggest checking your data for missing values/nan's as a first step.
    – Pallie
    Nov 7 at 10:44










  • Oh, you are right. Thank you
    – Jia Hao Lim
    Nov 7 at 10:48













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am not familiar with python and am trying to run a decision tree classifier in python using SKLEARN library and when I run the code, I encounters the error:




ValueError: Input contains NaN, infinity or a value too large for dtype('float32')




I have tried using a smaller subset of my excel datasheet and the code is able to execute with the results I want. So I suspect the problem is that my data set is too big. Here is my code that causes the crash:



df_X = data_train[['DayOfWeek', 'Promo', 'StateHoliday']]
df_Y = data_train[['Sales_band']]

X_train, X_test, y_train, y_test = train_test_split(df_X, df_Y, random_state=1)
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train) // Line that causes crash
y_predict = model.predict(X_test)

print('The accuracy of the Decision Tree is', accuracy_score(y_test, y_predict))









share|improve this question















I am not familiar with python and am trying to run a decision tree classifier in python using SKLEARN library and when I run the code, I encounters the error:




ValueError: Input contains NaN, infinity or a value too large for dtype('float32')




I have tried using a smaller subset of my excel datasheet and the code is able to execute with the results I want. So I suspect the problem is that my data set is too big. Here is my code that causes the crash:



df_X = data_train[['DayOfWeek', 'Promo', 'StateHoliday']]
df_Y = data_train[['Sales_band']]

X_train, X_test, y_train, y_test = train_test_split(df_X, df_Y, random_state=1)
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train) // Line that causes crash
y_predict = model.predict(X_test)

print('The accuracy of the Decision Tree is', accuracy_score(y_test, y_predict))






python pandas numpy scikit-learn sklearn-pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 10:38

























asked Nov 7 at 10:22









Jia Hao Lim

1




1








  • 2




    The error message seems to suggest that your dataset is not too big; rather that one of the values of your dataset is either: Not a number, infity or a number too large to fit into a floating point number of type float32. I would suggest checking your data for missing values/nan's as a first step.
    – Pallie
    Nov 7 at 10:44










  • Oh, you are right. Thank you
    – Jia Hao Lim
    Nov 7 at 10:48














  • 2




    The error message seems to suggest that your dataset is not too big; rather that one of the values of your dataset is either: Not a number, infity or a number too large to fit into a floating point number of type float32. I would suggest checking your data for missing values/nan's as a first step.
    – Pallie
    Nov 7 at 10:44










  • Oh, you are right. Thank you
    – Jia Hao Lim
    Nov 7 at 10:48








2




2




The error message seems to suggest that your dataset is not too big; rather that one of the values of your dataset is either: Not a number, infity or a number too large to fit into a floating point number of type float32. I would suggest checking your data for missing values/nan's as a first step.
– Pallie
Nov 7 at 10:44




The error message seems to suggest that your dataset is not too big; rather that one of the values of your dataset is either: Not a number, infity or a number too large to fit into a floating point number of type float32. I would suggest checking your data for missing values/nan's as a first step.
– Pallie
Nov 7 at 10:44












Oh, you are right. Thank you
– Jia Hao Lim
Nov 7 at 10:48




Oh, you are right. Thank you
– Jia Hao Lim
Nov 7 at 10:48












1 Answer
1






active

oldest

votes

















up vote
0
down vote













You may have missing values in your dataset. You may want to use dropna() to remove all rows containing missing values if it won't affect the quality of your prediction/accuracy of prediction






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53187515%2fsklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    You may have missing values in your dataset. You may want to use dropna() to remove all rows containing missing values if it won't affect the quality of your prediction/accuracy of prediction






    share|improve this answer

























      up vote
      0
      down vote













      You may have missing values in your dataset. You may want to use dropna() to remove all rows containing missing values if it won't affect the quality of your prediction/accuracy of prediction






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        You may have missing values in your dataset. You may want to use dropna() to remove all rows containing missing values if it won't affect the quality of your prediction/accuracy of prediction






        share|improve this answer












        You may have missing values in your dataset. You may want to use dropna() to remove all rows containing missing values if it won't affect the quality of your prediction/accuracy of prediction







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 21:51









        isaac-moore

        194




        194






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53187515%2fsklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Academy of Television Arts & Sciences

            L'Équipe

            1995 France bombings