Adding categorical columns into the prediction model












3















I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:



Company_Sector         Company_size  DMU_Final  Joining_Date  Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France


some more columns:



linkedin_shared_connections   online_activity  did_buy   Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12









share|improve this question























  • Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

    – Ashok KS
    Nov 21 '18 at 12:15











  • Did you have a look at pd.get_dummies

    – DeanLa
    Nov 21 '18 at 12:18











  • I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

    – Dataminer1
    Nov 21 '18 at 12:23






  • 1





    another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

    – Dataminer1
    Nov 21 '18 at 12:27













  • @AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

    – Vivek Kumar
    Nov 21 '18 at 13:18
















3















I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:



Company_Sector         Company_size  DMU_Final  Joining_Date  Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France


some more columns:



linkedin_shared_connections   online_activity  did_buy   Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12









share|improve this question























  • Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

    – Ashok KS
    Nov 21 '18 at 12:15











  • Did you have a look at pd.get_dummies

    – DeanLa
    Nov 21 '18 at 12:18











  • I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

    – Dataminer1
    Nov 21 '18 at 12:23






  • 1





    another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

    – Dataminer1
    Nov 21 '18 at 12:27













  • @AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

    – Vivek Kumar
    Nov 21 '18 at 13:18














3












3








3








I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:



Company_Sector         Company_size  DMU_Final  Joining_Date  Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France


some more columns:



linkedin_shared_connections   online_activity  did_buy   Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12









share|improve this question














I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:



Company_Sector         Company_size  DMU_Final  Joining_Date  Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France


some more columns:



linkedin_shared_connections   online_activity  did_buy   Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12






python pandas numpy scikit-learn data-science






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 11:51









Dataminer1Dataminer1

284




284













  • Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

    – Ashok KS
    Nov 21 '18 at 12:15











  • Did you have a look at pd.get_dummies

    – DeanLa
    Nov 21 '18 at 12:18











  • I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

    – Dataminer1
    Nov 21 '18 at 12:23






  • 1





    another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

    – Dataminer1
    Nov 21 '18 at 12:27













  • @AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

    – Vivek Kumar
    Nov 21 '18 at 13:18



















  • Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

    – Ashok KS
    Nov 21 '18 at 12:15











  • Did you have a look at pd.get_dummies

    – DeanLa
    Nov 21 '18 at 12:18











  • I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

    – Dataminer1
    Nov 21 '18 at 12:23






  • 1





    another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

    – Dataminer1
    Nov 21 '18 at 12:27













  • @AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

    – Vivek Kumar
    Nov 21 '18 at 13:18

















Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

– Ashok KS
Nov 21 '18 at 12:15





Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

– Ashok KS
Nov 21 '18 at 12:15













Did you have a look at pd.get_dummies

– DeanLa
Nov 21 '18 at 12:18





Did you have a look at pd.get_dummies

– DeanLa
Nov 21 '18 at 12:18













I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

– Dataminer1
Nov 21 '18 at 12:23





I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

– Dataminer1
Nov 21 '18 at 12:23




1




1





another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

– Dataminer1
Nov 21 '18 at 12:27







another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

– Dataminer1
Nov 21 '18 at 12:27















@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

– Vivek Kumar
Nov 21 '18 at 13:18





@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

– Vivek Kumar
Nov 21 '18 at 13:18












1 Answer
1






active

oldest

votes


















0














you have different choices to convert categorical variables to numerical or binary variables.
for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
{France:1, China:2, ....}



#import libraries
from sklearn import preprocessing
import pandas as pd
#Create a label encoder object and fit to Country Column
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df['Country'])
# View the label {France,China,...}
list(label_encoder.classes_)
# Transform Country Column to Numerical Var
label_encoder.transform(df['Country'])
# Convert some integers into their category names --->{China,China,France}
list(label_encoder.inverse_transform([2, 2, 1]))





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411462%2fadding-categorical-columns-into-the-prediction-model%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    you have different choices to convert categorical variables to numerical or binary variables.
    for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
    {France:1, China:2, ....}



    #import libraries
    from sklearn import preprocessing
    import pandas as pd
    #Create a label encoder object and fit to Country Column
    label_encoder = preprocessing.LabelEncoder()
    label_encoder.fit(df['Country'])
    # View the label {France,China,...}
    list(label_encoder.classes_)
    # Transform Country Column to Numerical Var
    label_encoder.transform(df['Country'])
    # Convert some integers into their category names --->{China,China,France}
    list(label_encoder.inverse_transform([2, 2, 1]))





    share|improve this answer




























      0














      you have different choices to convert categorical variables to numerical or binary variables.
      for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
      {France:1, China:2, ....}



      #import libraries
      from sklearn import preprocessing
      import pandas as pd
      #Create a label encoder object and fit to Country Column
      label_encoder = preprocessing.LabelEncoder()
      label_encoder.fit(df['Country'])
      # View the label {France,China,...}
      list(label_encoder.classes_)
      # Transform Country Column to Numerical Var
      label_encoder.transform(df['Country'])
      # Convert some integers into their category names --->{China,China,France}
      list(label_encoder.inverse_transform([2, 2, 1]))





      share|improve this answer


























        0












        0








        0







        you have different choices to convert categorical variables to numerical or binary variables.
        for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
        {France:1, China:2, ....}



        #import libraries
        from sklearn import preprocessing
        import pandas as pd
        #Create a label encoder object and fit to Country Column
        label_encoder = preprocessing.LabelEncoder()
        label_encoder.fit(df['Country'])
        # View the label {France,China,...}
        list(label_encoder.classes_)
        # Transform Country Column to Numerical Var
        label_encoder.transform(df['Country'])
        # Convert some integers into their category names --->{China,China,France}
        list(label_encoder.inverse_transform([2, 2, 1]))





        share|improve this answer













        you have different choices to convert categorical variables to numerical or binary variables.
        for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
        {France:1, China:2, ....}



        #import libraries
        from sklearn import preprocessing
        import pandas as pd
        #Create a label encoder object and fit to Country Column
        label_encoder = preprocessing.LabelEncoder()
        label_encoder.fit(df['Country'])
        # View the label {France,China,...}
        list(label_encoder.classes_)
        # Transform Country Column to Numerical Var
        label_encoder.transform(df['Country'])
        # Convert some integers into their category names --->{China,China,France}
        list(label_encoder.inverse_transform([2, 2, 1]))






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 29 '18 at 17:21









        Mohammad HoseiniMohammad Hoseini

        214




        214
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411462%2fadding-categorical-columns-into-the-prediction-model%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini