K-fold Cross-Validation on table in Matlab
up vote
0
down vote
favorite
I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:
School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
School = categorical(School);
Age = [14;14;12;16;19];
Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
Relationship = categorical(Relationship);
Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
Status = categorical(Status);
Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
Father_Job = categorical(Father_Job);
Health = [1;2;3;3;5];
Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
Exam = categorical(Exam);
T =
School Age Relationship Status Father_Job Health Exam
______ ___ ____________ ______ __________ ______ ____
GB 14 yes ft pol 1 pass
UR 14 yes pt ser 2 pass
GB 12 no ft oth 3 fail
GB 16 no ft ele 3 fail
UR 19 yes ft cle 5 pass
I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.
However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.
My goal is to perform the following operations:
% Column 7 (Exam) is the response variable
X = T(:, 1:6);
Y = T(:, 7);
% Create indices of 5-fold cross-validation (here I get errors)
cvpart = cvpartition(Y,'KFold',5);
indices = crossvalind('Kfold',Y,5);
% Create my test and training sets
for i = 1:5
test = (indices == i);
train = ~test;
Xtrain = X(train,:);
Xtest = X(test,:);
Ytrain = Y(train,:);
Ytest = Y(test,:);
end
% Fit logistic model
mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')
Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.
matlab logistic-regression cross-validation categorical-data naivebayes
add a comment |
up vote
0
down vote
favorite
I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:
School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
School = categorical(School);
Age = [14;14;12;16;19];
Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
Relationship = categorical(Relationship);
Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
Status = categorical(Status);
Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
Father_Job = categorical(Father_Job);
Health = [1;2;3;3;5];
Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
Exam = categorical(Exam);
T =
School Age Relationship Status Father_Job Health Exam
______ ___ ____________ ______ __________ ______ ____
GB 14 yes ft pol 1 pass
UR 14 yes pt ser 2 pass
GB 12 no ft oth 3 fail
GB 16 no ft ele 3 fail
UR 19 yes ft cle 5 pass
I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.
However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.
My goal is to perform the following operations:
% Column 7 (Exam) is the response variable
X = T(:, 1:6);
Y = T(:, 7);
% Create indices of 5-fold cross-validation (here I get errors)
cvpart = cvpartition(Y,'KFold',5);
indices = crossvalind('Kfold',Y,5);
% Create my test and training sets
for i = 1:5
test = (indices == i);
train = ~test;
Xtrain = X(train,:);
Xtest = X(test,:);
Ytrain = Y(train,:);
Ytest = Y(test,:);
end
% Fit logistic model
mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')
Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.
matlab logistic-regression cross-validation categorical-data naivebayes
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:
School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
School = categorical(School);
Age = [14;14;12;16;19];
Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
Relationship = categorical(Relationship);
Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
Status = categorical(Status);
Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
Father_Job = categorical(Father_Job);
Health = [1;2;3;3;5];
Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
Exam = categorical(Exam);
T =
School Age Relationship Status Father_Job Health Exam
______ ___ ____________ ______ __________ ______ ____
GB 14 yes ft pol 1 pass
UR 14 yes pt ser 2 pass
GB 12 no ft oth 3 fail
GB 16 no ft ele 3 fail
UR 19 yes ft cle 5 pass
I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.
However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.
My goal is to perform the following operations:
% Column 7 (Exam) is the response variable
X = T(:, 1:6);
Y = T(:, 7);
% Create indices of 5-fold cross-validation (here I get errors)
cvpart = cvpartition(Y,'KFold',5);
indices = crossvalind('Kfold',Y,5);
% Create my test and training sets
for i = 1:5
test = (indices == i);
train = ~test;
Xtrain = X(train,:);
Xtest = X(test,:);
Ytrain = Y(train,:);
Ytest = Y(test,:);
end
% Fit logistic model
mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')
Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.
matlab logistic-regression cross-validation categorical-data naivebayes
I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:
School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
School = categorical(School);
Age = [14;14;12;16;19];
Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
Relationship = categorical(Relationship);
Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
Status = categorical(Status);
Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
Father_Job = categorical(Father_Job);
Health = [1;2;3;3;5];
Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
Exam = categorical(Exam);
T =
School Age Relationship Status Father_Job Health Exam
______ ___ ____________ ______ __________ ______ ____
GB 14 yes ft pol 1 pass
UR 14 yes pt ser 2 pass
GB 12 no ft oth 3 fail
GB 16 no ft ele 3 fail
UR 19 yes ft cle 5 pass
I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.
However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.
My goal is to perform the following operations:
% Column 7 (Exam) is the response variable
X = T(:, 1:6);
Y = T(:, 7);
% Create indices of 5-fold cross-validation (here I get errors)
cvpart = cvpartition(Y,'KFold',5);
indices = crossvalind('Kfold',Y,5);
% Create my test and training sets
for i = 1:5
test = (indices == i);
train = ~test;
Xtrain = X(train,:);
Xtest = X(test,:);
Ytrain = Y(train,:);
Ytest = Y(test,:);
end
% Fit logistic model
mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')
Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.
matlab logistic-regression cross-validation categorical-data naivebayes
matlab logistic-regression cross-validation categorical-data naivebayes
edited Nov 7 at 18:16
asked Nov 7 at 18:03
Notna
10518
10518
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.
Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
– Notna
Nov 8 at 10:00
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.
Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
– Notna
Nov 8 at 10:00
add a comment |
up vote
0
down vote
I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.
Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
– Notna
Nov 8 at 10:00
add a comment |
up vote
0
down vote
up vote
0
down vote
I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.
I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.
answered Nov 7 at 23:12
Raha
2667
2667
Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
– Notna
Nov 8 at 10:00
add a comment |
Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
– Notna
Nov 8 at 10:00
Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
– Notna
Nov 8 at 10:00
Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
– Notna
Nov 8 at 10:00
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195215%2fk-fold-cross-validation-on-table-in-matlab%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown