Adding categorical columns into the prediction model

I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:

Company_Sector         Company_size  DMU_Final  Joining_Date  Country

Finance and Insurance       10        End User   2010-04-13   France

Public Administration       1         End User   2004-09-22   France

some more columns:

linkedin_shared_connections   online_activity  did_buy   Sale_Date

            11                        65           1      2016-05-23

            13                        100          1      2016-01-12

asked Nov 21 '18 at 11:51

Dataminer1

284

Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

– Ashok KS
Nov 21 '18 at 12:15

Did you have a look at pd.get_dummies

– DeanLa
Nov 21 '18 at 12:18

I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

– Dataminer1
Nov 21 '18 at 12:23

1

another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

– Dataminer1
Nov 21 '18 at 12:27

@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

– Vivek Kumar
Nov 21 '18 at 13:18

|
show 2 more comments

Company_Sector         Company_size  DMU_Final  Joining_Date  Country

Finance and Insurance       10        End User   2010-04-13   France

Public Administration       1         End User   2004-09-22   France

some more columns:

linkedin_shared_connections   online_activity  did_buy   Sale_Date

            11                        65           1      2016-05-23

            13                        100          1      2016-01-12

asked Nov 21 '18 at 11:51

Dataminer1

284

Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

– Ashok KS
Nov 21 '18 at 12:15

Did you have a look at pd.get_dummies

– DeanLa
Nov 21 '18 at 12:18

I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

– Dataminer1
Nov 21 '18 at 12:23

1

another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

– Dataminer1
Nov 21 '18 at 12:27

@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

– Vivek Kumar
Nov 21 '18 at 13:18

|
show 2 more comments

Company_Sector         Company_size  DMU_Final  Joining_Date  Country

Finance and Insurance       10        End User   2010-04-13   France

Public Administration       1         End User   2004-09-22   France

some more columns:

linkedin_shared_connections   online_activity  did_buy   Sale_Date

            11                        65           1      2016-05-23

            13                        100          1      2016-01-12

asked Nov 21 '18 at 11:51

Dataminer1

284

Company_Sector         Company_size  DMU_Final  Joining_Date  Country

Finance and Insurance       10        End User   2010-04-13   France

Public Administration       1         End User   2004-09-22   France

some more columns:

linkedin_shared_connections   online_activity  did_buy   Sale_Date

            11                        65           1      2016-05-23

            13                        100          1      2016-01-12

python pandas numpy scikit-learn data-science

asked Nov 21 '18 at 11:51

Dataminer1

284

asked Nov 21 '18 at 11:51

Dataminer1

284

asked Nov 21 '18 at 11:51

Dataminer1

284

asked Nov 21 '18 at 11:51

Dataminer1

284

asked Nov 21 '18 at 11:51

Dataminer1

284

Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

– Ashok KS
Nov 21 '18 at 12:15

Did you have a look at pd.get_dummies

– DeanLa
Nov 21 '18 at 12:18

I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

– Dataminer1
Nov 21 '18 at 12:23

1

another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

– Dataminer1
Nov 21 '18 at 12:27

@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

– Vivek Kumar
Nov 21 '18 at 13:18

|
show 2 more comments

Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

– Ashok KS
Nov 21 '18 at 12:15

Did you have a look at pd.get_dummies

– DeanLa
Nov 21 '18 at 12:18

I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

– Dataminer1
Nov 21 '18 at 12:23

1

another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

– Dataminer1
Nov 21 '18 at 12:27

@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

– Vivek Kumar
Nov 21 '18 at 13:18

Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.

– Ashok KS
Nov 21 '18 at 12:15

Did you have a look at pd.get_dummies

– DeanLa
Nov 21 '18 at 12:18

I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.

– Dataminer1
Nov 21 '18 at 12:23

another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970

– Dataminer1
Nov 21 '18 at 12:27

@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.

– Vivek Kumar
Nov 21 '18 at 13:18

|
show 2 more comments

1 Answer
1

active

oldest

votes

you have different choices to convert categorical variables to numerical or binary variables.
for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
{France:1, China:2, ....}

#import libraries

from sklearn import preprocessing

import pandas as pd

#Create a label encoder object and fit to Country Column

label_encoder = preprocessing.LabelEncoder()

label_encoder.fit(df['Country'])

# View the label {France,China,...}

list(label_encoder.classes_)

# Transform Country Column to Numerical Var

label_encoder.transform(df['Country']) 

# Convert some integers into their category names --->{China,China,France}

list(label_encoder.inverse_transform([2, 2, 1]))

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411462%2fadding-categorical-columns-into-the-prediction-model%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

#import libraries

from sklearn import preprocessing

import pandas as pd

#Create a label encoder object and fit to Country Column

label_encoder = preprocessing.LabelEncoder()

label_encoder.fit(df['Country'])

# View the label {France,China,...}

list(label_encoder.classes_)

# Transform Country Column to Numerical Var

label_encoder.transform(df['Country']) 

# Convert some integers into their category names --->{China,China,France}

list(label_encoder.inverse_transform([2, 2, 1]))

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

add a comment |

#import libraries

from sklearn import preprocessing

import pandas as pd

#Create a label encoder object and fit to Country Column

label_encoder = preprocessing.LabelEncoder()

label_encoder.fit(df['Country'])

# View the label {France,China,...}

list(label_encoder.classes_)

# Transform Country Column to Numerical Var

label_encoder.transform(df['Country']) 

# Convert some integers into their category names --->{China,China,France}

list(label_encoder.inverse_transform([2, 2, 1]))

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

add a comment |

#import libraries

from sklearn import preprocessing

import pandas as pd

#Create a label encoder object and fit to Country Column

label_encoder = preprocessing.LabelEncoder()

label_encoder.fit(df['Country'])

# View the label {France,China,...}

list(label_encoder.classes_)

# Transform Country Column to Numerical Var

label_encoder.transform(df['Country']) 

# Convert some integers into their category names --->{China,China,France}

list(label_encoder.inverse_transform([2, 2, 1]))

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

#import libraries

from sklearn import preprocessing

import pandas as pd

#Create a label encoder object and fit to Country Column

label_encoder = preprocessing.LabelEncoder()

label_encoder.fit(df['Country'])

# View the label {France,China,...}

list(label_encoder.classes_)

# Transform Country Column to Numerical Var

label_encoder.transform(df['Country']) 

# Convert some integers into their category names --->{China,China,France}

list(label_encoder.inverse_transform([2, 2, 1]))

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

answered Nov 29 '18 at 17:21

Mohammad Hoseini

214

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

70WaFRtqzKcK4S,kGfkPtwFyOn4CQpw59qKOYteIQ0zOv1ZF1

搜尋此網誌

Wsrtjtyk