K-fold Cross-Validation on table in Matlab











up vote
0
down vote

favorite












I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:



School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
School = categorical(School);
Age = [14;14;12;16;19];
Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
Relationship = categorical(Relationship);
Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
Status = categorical(Status);
Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
Father_Job = categorical(Father_Job);
Health = [1;2;3;3;5];
Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
Exam = categorical(Exam);

T =
School Age Relationship Status Father_Job Health Exam
______ ___ ____________ ______ __________ ______ ____

GB 14 yes ft pol 1 pass
UR 14 yes pt ser 2 pass
GB 12 no ft oth 3 fail
GB 16 no ft ele 3 fail
UR 19 yes ft cle 5 pass


I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.



However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.



My goal is to perform the following operations:



% Column 7 (Exam) is the response variable
X = T(:, 1:6);
Y = T(:, 7);

% Create indices of 5-fold cross-validation (here I get errors)
cvpart = cvpartition(Y,'KFold',5);
indices = crossvalind('Kfold',Y,5);

% Create my test and training sets
for i = 1:5
test = (indices == i);
train = ~test;
Xtrain = X(train,:);
Xtest = X(test,:);
Ytrain = Y(train,:);
Ytest = Y(test,:);
end

% Fit logistic model
mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')


Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.










share|improve this question




























    up vote
    0
    down vote

    favorite












    I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:



    School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
    School = categorical(School);
    Age = [14;14;12;16;19];
    Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
    Relationship = categorical(Relationship);
    Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
    Status = categorical(Status);
    Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
    Father_Job = categorical(Father_Job);
    Health = [1;2;3;3;5];
    Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
    Exam = categorical(Exam);

    T =
    School Age Relationship Status Father_Job Health Exam
    ______ ___ ____________ ______ __________ ______ ____

    GB 14 yes ft pol 1 pass
    UR 14 yes pt ser 2 pass
    GB 12 no ft oth 3 fail
    GB 16 no ft ele 3 fail
    UR 19 yes ft cle 5 pass


    I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.



    However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.



    My goal is to perform the following operations:



    % Column 7 (Exam) is the response variable
    X = T(:, 1:6);
    Y = T(:, 7);

    % Create indices of 5-fold cross-validation (here I get errors)
    cvpart = cvpartition(Y,'KFold',5);
    indices = crossvalind('Kfold',Y,5);

    % Create my test and training sets
    for i = 1:5
    test = (indices == i);
    train = ~test;
    Xtrain = X(train,:);
    Xtest = X(test,:);
    Ytrain = Y(train,:);
    Ytest = Y(test,:);
    end

    % Fit logistic model
    mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')


    Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:



      School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
      School = categorical(School);
      Age = [14;14;12;16;19];
      Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
      Relationship = categorical(Relationship);
      Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
      Status = categorical(Status);
      Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
      Father_Job = categorical(Father_Job);
      Health = [1;2;3;3;5];
      Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
      Exam = categorical(Exam);

      T =
      School Age Relationship Status Father_Job Health Exam
      ______ ___ ____________ ______ __________ ______ ____

      GB 14 yes ft pol 1 pass
      UR 14 yes pt ser 2 pass
      GB 12 no ft oth 3 fail
      GB 16 no ft ele 3 fail
      UR 19 yes ft cle 5 pass


      I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.



      However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.



      My goal is to perform the following operations:



      % Column 7 (Exam) is the response variable
      X = T(:, 1:6);
      Y = T(:, 7);

      % Create indices of 5-fold cross-validation (here I get errors)
      cvpart = cvpartition(Y,'KFold',5);
      indices = crossvalind('Kfold',Y,5);

      % Create my test and training sets
      for i = 1:5
      test = (indices == i);
      train = ~test;
      Xtrain = X(train,:);
      Xtest = X(test,:);
      Ytrain = Y(train,:);
      Ytest = Y(test,:);
      end

      % Fit logistic model
      mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')


      Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.










      share|improve this question















      I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:



      School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
      School = categorical(School);
      Age = [14;14;12;16;19];
      Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
      Relationship = categorical(Relationship);
      Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
      Status = categorical(Status);
      Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
      Father_Job = categorical(Father_Job);
      Health = [1;2;3;3;5];
      Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
      Exam = categorical(Exam);

      T =
      School Age Relationship Status Father_Job Health Exam
      ______ ___ ____________ ______ __________ ______ ____

      GB 14 yes ft pol 1 pass
      UR 14 yes pt ser 2 pass
      GB 12 no ft oth 3 fail
      GB 16 no ft ele 3 fail
      UR 19 yes ft cle 5 pass


      I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.



      However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.



      My goal is to perform the following operations:



      % Column 7 (Exam) is the response variable
      X = T(:, 1:6);
      Y = T(:, 7);

      % Create indices of 5-fold cross-validation (here I get errors)
      cvpart = cvpartition(Y,'KFold',5);
      indices = crossvalind('Kfold',Y,5);

      % Create my test and training sets
      for i = 1:5
      test = (indices == i);
      train = ~test;
      Xtrain = X(train,:);
      Xtest = X(test,:);
      Ytrain = Y(train,:);
      Ytest = Y(test,:);
      end

      % Fit logistic model
      mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')


      Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.







      matlab logistic-regression cross-validation categorical-data naivebayes






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 7 at 18:16

























      asked Nov 7 at 18:03









      Notna

      10518




      10518
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.






          share|improve this answer





















          • Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
            – Notna
            Nov 8 at 10:00











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195215%2fk-fold-cross-validation-on-table-in-matlab%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.






          share|improve this answer





















          • Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
            – Notna
            Nov 8 at 10:00















          up vote
          0
          down vote













          I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.






          share|improve this answer





















          • Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
            – Notna
            Nov 8 at 10:00













          up vote
          0
          down vote










          up vote
          0
          down vote









          I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.






          share|improve this answer












          I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 7 at 23:12









          Raha

          2667




          2667












          • Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
            – Notna
            Nov 8 at 10:00


















          • Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
            – Notna
            Nov 8 at 10:00
















          Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
          – Notna
          Nov 8 at 10:00




          Hello. The dataset I posted here is just a sample, the one I am working with has 600+ rows and 33 columns. So it shouldn't be the problem.
          – Notna
          Nov 8 at 10:00


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195215%2fk-fold-cross-validation-on-table-in-matlab%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Academy of Television Arts & Sciences

          L'Équipe

          1995 France bombings