How can I look at a specific generated train and test sets made from for loop?











up vote
1
down vote

favorite












My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.



I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.



This is the for loop:



result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")

for (g in 1:100 )
{

# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]

REST OF MY CODE

}


My result_df (first 20 rows) looks like this:



> result_df[1:20,]
Acc Sens Spec id
1 26 22 29 1
2 10 49 11 2
3 37 43 36 3
4 4 79 4 4
5 21 21 20 5
6 31 17 34 6
7 57 4 63 7
8 33 3 39 8
9 56 42 59 9
10 65 88 63 10
11 6 31 7 11
12 57 44 62 12
13 25 10 27 13
14 32 24 32 14
15 19 8 19 15
16 27 27 29 16
17 38 89 33 17
18 54 32 56 18
19 35 62 33 19
20 37 6 40 20


I use ggplot() to plot the specifity and the sensitivity as a scatterplot:



enter image description here



What I want to do :



I want to see e.g. the train and test set of datapoint 17.



I think I can do this by using the set.seed function, but I am very unfamiliar with this function.










share|improve this question


























    up vote
    1
    down vote

    favorite












    My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.



    I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.



    This is the for loop:



    result_df<-matrix(ncol=3,nrow=100)
    colnames(result_df)<-c("Acc","Sens","Spec")

    for (g in 1:100 )
    {

    # Divide into Train and test set
    smp_size <- floor(0.8 * nrow(mydata1))
    train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
    train <- mydata1[train_ind, ]
    test <- mydata1[-train_ind, ]

    REST OF MY CODE

    }


    My result_df (first 20 rows) looks like this:



    > result_df[1:20,]
    Acc Sens Spec id
    1 26 22 29 1
    2 10 49 11 2
    3 37 43 36 3
    4 4 79 4 4
    5 21 21 20 5
    6 31 17 34 6
    7 57 4 63 7
    8 33 3 39 8
    9 56 42 59 9
    10 65 88 63 10
    11 6 31 7 11
    12 57 44 62 12
    13 25 10 27 13
    14 32 24 32 14
    15 19 8 19 15
    16 27 27 29 16
    17 38 89 33 17
    18 54 32 56 18
    19 35 62 33 19
    20 37 6 40 20


    I use ggplot() to plot the specifity and the sensitivity as a scatterplot:



    enter image description here



    What I want to do :



    I want to see e.g. the train and test set of datapoint 17.



    I think I can do this by using the set.seed function, but I am very unfamiliar with this function.










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.



      I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.



      This is the for loop:



      result_df<-matrix(ncol=3,nrow=100)
      colnames(result_df)<-c("Acc","Sens","Spec")

      for (g in 1:100 )
      {

      # Divide into Train and test set
      smp_size <- floor(0.8 * nrow(mydata1))
      train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
      train <- mydata1[train_ind, ]
      test <- mydata1[-train_ind, ]

      REST OF MY CODE

      }


      My result_df (first 20 rows) looks like this:



      > result_df[1:20,]
      Acc Sens Spec id
      1 26 22 29 1
      2 10 49 11 2
      3 37 43 36 3
      4 4 79 4 4
      5 21 21 20 5
      6 31 17 34 6
      7 57 4 63 7
      8 33 3 39 8
      9 56 42 59 9
      10 65 88 63 10
      11 6 31 7 11
      12 57 44 62 12
      13 25 10 27 13
      14 32 24 32 14
      15 19 8 19 15
      16 27 27 29 16
      17 38 89 33 17
      18 54 32 56 18
      19 35 62 33 19
      20 37 6 40 20


      I use ggplot() to plot the specifity and the sensitivity as a scatterplot:



      enter image description here



      What I want to do :



      I want to see e.g. the train and test set of datapoint 17.



      I think I can do this by using the set.seed function, but I am very unfamiliar with this function.










      share|improve this question













      My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.



      I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.



      This is the for loop:



      result_df<-matrix(ncol=3,nrow=100)
      colnames(result_df)<-c("Acc","Sens","Spec")

      for (g in 1:100 )
      {

      # Divide into Train and test set
      smp_size <- floor(0.8 * nrow(mydata1))
      train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
      train <- mydata1[train_ind, ]
      test <- mydata1[-train_ind, ]

      REST OF MY CODE

      }


      My result_df (first 20 rows) looks like this:



      > result_df[1:20,]
      Acc Sens Spec id
      1 26 22 29 1
      2 10 49 11 2
      3 37 43 36 3
      4 4 79 4 4
      5 21 21 20 5
      6 31 17 34 6
      7 57 4 63 7
      8 33 3 39 8
      9 56 42 59 9
      10 65 88 63 10
      11 6 31 7 11
      12 57 44 62 12
      13 25 10 27 13
      14 32 24 32 14
      15 19 8 19 15
      16 27 27 29 16
      17 38 89 33 17
      18 54 32 56 18
      19 35 62 33 19
      20 37 6 40 20


      I use ggplot() to plot the specifity and the sensitivity as a scatterplot:



      enter image description here



      What I want to do :



      I want to see e.g. the train and test set of datapoint 17.



      I think I can do this by using the set.seed function, but I am very unfamiliar with this function.







      r random






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 7 at 8:58









      pineapple

      486




      486
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.



          With your current code all you can do is to see that last train and test sets (number 100). That is because you keep redefining test, train, train_ind variables. The cheapest (in terms of memory) way to achieve what you want would be to somehow store train_ind from each iteration. For instance, you could use



          train_inds <- list()[rep(1, 100)]
          for (g in 1:100 )
          {
          smp_size <- floor(0.8 * nrow(mydata1))
          train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)
          train <- mydata1[train_inds[[g]], ]
          test <- mydata1[-train_ind[[g]], ]
          # The rest
          }


          and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.



          Lastly, set.seed isn't really going to help here. If all you were doing was running rnorm(1) hundred times, then yes, by using set.seed you could quickly recover the n-th generated value later. In your case, however, you are not only using sample for train_ind; the model estimation functions are also very likely generating random values.






          share|improve this answer





















          • @pineapple, does it answer your question?
            – Julius Vainora
            Nov 7 at 15:31











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53186195%2fhow-can-i-look-at-a-specific-generated-train-and-test-sets-made-from-for-loop%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.



          With your current code all you can do is to see that last train and test sets (number 100). That is because you keep redefining test, train, train_ind variables. The cheapest (in terms of memory) way to achieve what you want would be to somehow store train_ind from each iteration. For instance, you could use



          train_inds <- list()[rep(1, 100)]
          for (g in 1:100 )
          {
          smp_size <- floor(0.8 * nrow(mydata1))
          train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)
          train <- mydata1[train_inds[[g]], ]
          test <- mydata1[-train_ind[[g]], ]
          # The rest
          }


          and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.



          Lastly, set.seed isn't really going to help here. If all you were doing was running rnorm(1) hundred times, then yes, by using set.seed you could quickly recover the n-th generated value later. In your case, however, you are not only using sample for train_ind; the model estimation functions are also very likely generating random values.






          share|improve this answer





















          • @pineapple, does it answer your question?
            – Julius Vainora
            Nov 7 at 15:31















          up vote
          0
          down vote













          First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.



          With your current code all you can do is to see that last train and test sets (number 100). That is because you keep redefining test, train, train_ind variables. The cheapest (in terms of memory) way to achieve what you want would be to somehow store train_ind from each iteration. For instance, you could use



          train_inds <- list()[rep(1, 100)]
          for (g in 1:100 )
          {
          smp_size <- floor(0.8 * nrow(mydata1))
          train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)
          train <- mydata1[train_inds[[g]], ]
          test <- mydata1[-train_ind[[g]], ]
          # The rest
          }


          and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.



          Lastly, set.seed isn't really going to help here. If all you were doing was running rnorm(1) hundred times, then yes, by using set.seed you could quickly recover the n-th generated value later. In your case, however, you are not only using sample for train_ind; the model estimation functions are also very likely generating random values.






          share|improve this answer





















          • @pineapple, does it answer your question?
            – Julius Vainora
            Nov 7 at 15:31













          up vote
          0
          down vote










          up vote
          0
          down vote









          First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.



          With your current code all you can do is to see that last train and test sets (number 100). That is because you keep redefining test, train, train_ind variables. The cheapest (in terms of memory) way to achieve what you want would be to somehow store train_ind from each iteration. For instance, you could use



          train_inds <- list()[rep(1, 100)]
          for (g in 1:100 )
          {
          smp_size <- floor(0.8 * nrow(mydata1))
          train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)
          train <- mydata1[train_inds[[g]], ]
          test <- mydata1[-train_ind[[g]], ]
          # The rest
          }


          and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.



          Lastly, set.seed isn't really going to help here. If all you were doing was running rnorm(1) hundred times, then yes, by using set.seed you could quickly recover the n-th generated value later. In your case, however, you are not only using sample for train_ind; the model estimation functions are also very likely generating random values.






          share|improve this answer












          First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.



          With your current code all you can do is to see that last train and test sets (number 100). That is because you keep redefining test, train, train_ind variables. The cheapest (in terms of memory) way to achieve what you want would be to somehow store train_ind from each iteration. For instance, you could use



          train_inds <- list()[rep(1, 100)]
          for (g in 1:100 )
          {
          smp_size <- floor(0.8 * nrow(mydata1))
          train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)
          train <- mydata1[train_inds[[g]], ]
          test <- mydata1[-train_ind[[g]], ]
          # The rest
          }


          and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.



          Lastly, set.seed isn't really going to help here. If all you were doing was running rnorm(1) hundred times, then yes, by using set.seed you could quickly recover the n-th generated value later. In your case, however, you are not only using sample for train_ind; the model estimation functions are also very likely generating random values.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 7 at 10:42









          Julius Vainora

          26.5k75877




          26.5k75877












          • @pineapple, does it answer your question?
            – Julius Vainora
            Nov 7 at 15:31


















          • @pineapple, does it answer your question?
            – Julius Vainora
            Nov 7 at 15:31
















          @pineapple, does it answer your question?
          – Julius Vainora
          Nov 7 at 15:31




          @pineapple, does it answer your question?
          – Julius Vainora
          Nov 7 at 15:31


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53186195%2fhow-can-i-look-at-a-specific-generated-train-and-test-sets-made-from-for-loop%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini