How can I look at a specific generated train and test sets made from for loop?

up vote
1
down vote

favorite

My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.

I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.

This is the for loop:

result_df<-matrix(ncol=3,nrow=100)

colnames(result_df)<-c("Acc","Sens","Spec")



for (g in 1:100 )

{



  # Divide into Train and test set

  smp_size <- floor(0.8 * nrow(mydata1))

  train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_ind, ]

  test <- mydata1[-train_ind, ]



  REST OF MY CODE



}

My result_df (first 20 rows) looks like this:

> result_df[1:20,]

   Acc Sens Spec id

1   26   22   29  1

2   10   49   11  2

3   37   43   36  3

4    4   79    4  4

5   21   21   20  5

6   31   17   34  6

7   57    4   63  7

8   33    3   39  8

9   56   42   59  9

10  65   88   63 10

11   6   31    7 11

12  57   44   62 12

13  25   10   27 13

14  32   24   32 14

15  19    8   19 15

16  27   27   29 16

17  38   89   33 17

18  54   32   56 18

19  35   62   33 19

20  37    6   40 20

I use ggplot() to plot the specifity and the sensitivity as a scatterplot:

enter image description here

What I want to do :

I want to see e.g. the train and test set of datapoint 17.

I think I can do this by using the set.seed function, but I am very unfamiliar with this function.

asked Nov 7 at 8:58

pineapple

486

add a comment |

up vote
1
down vote

favorite

My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.

I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.

This is the for loop:

result_df<-matrix(ncol=3,nrow=100)

colnames(result_df)<-c("Acc","Sens","Spec")



for (g in 1:100 )

{



  # Divide into Train and test set

  smp_size <- floor(0.8 * nrow(mydata1))

  train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_ind, ]

  test <- mydata1[-train_ind, ]



  REST OF MY CODE



}

My result_df (first 20 rows) looks like this:

> result_df[1:20,]

   Acc Sens Spec id

1   26   22   29  1

2   10   49   11  2

3   37   43   36  3

4    4   79    4  4

5   21   21   20  5

6   31   17   34  6

7   57    4   63  7

8   33    3   39  8

9   56   42   59  9

10  65   88   63 10

11   6   31    7 11

12  57   44   62 12

13  25   10   27 13

14  32   24   32 14

15  19    8   19 15

16  27   27   29 16

17  38   89   33 17

18  54   32   56 18

19  35   62   33 19

20  37    6   40 20

I use ggplot() to plot the specifity and the sensitivity as a scatterplot:

enter image description here

What I want to do :

I want to see e.g. the train and test set of datapoint 17.

I think I can do this by using the set.seed function, but I am very unfamiliar with this function.

asked Nov 7 at 8:58

pineapple

486

add a comment |

up vote
1
down vote

favorite

My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.

I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.

This is the for loop:

result_df<-matrix(ncol=3,nrow=100)

colnames(result_df)<-c("Acc","Sens","Spec")



for (g in 1:100 )

{



  # Divide into Train and test set

  smp_size <- floor(0.8 * nrow(mydata1))

  train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_ind, ]

  test <- mydata1[-train_ind, ]



  REST OF MY CODE



}

My result_df (first 20 rows) looks like this:

> result_df[1:20,]

   Acc Sens Spec id

1   26   22   29  1

2   10   49   11  2

3   37   43   36  3

4    4   79    4  4

5   21   21   20  5

6   31   17   34  6

7   57    4   63  7

8   33    3   39  8

9   56   42   59  9

10  65   88   63 10

11   6   31    7 11

12  57   44   62 12

13  25   10   27 13

14  32   24   32 14

15  19    8   19 15

16  27   27   29 16

17  38   89   33 17

18  54   32   56 18

19  35   62   33 19

20  37    6   40 20

I use ggplot() to plot the specifity and the sensitivity as a scatterplot:

enter image description here

What I want to do :

I want to see e.g. the train and test set of datapoint 17.

I think I can do this by using the set.seed function, but I am very unfamiliar with this function.

asked Nov 7 at 8:58

pineapple

486

My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.

I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.

This is the for loop:

result_df<-matrix(ncol=3,nrow=100)

colnames(result_df)<-c("Acc","Sens","Spec")



for (g in 1:100 )

{



  # Divide into Train and test set

  smp_size <- floor(0.8 * nrow(mydata1))

  train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_ind, ]

  test <- mydata1[-train_ind, ]



  REST OF MY CODE



}

My result_df (first 20 rows) looks like this:

> result_df[1:20,]

   Acc Sens Spec id

1   26   22   29  1

2   10   49   11  2

3   37   43   36  3

4    4   79    4  4

5   21   21   20  5

6   31   17   34  6

7   57    4   63  7

8   33    3   39  8

9   56   42   59  9

10  65   88   63 10

11   6   31    7 11

12  57   44   62 12

13  25   10   27 13

14  32   24   32 14

15  19    8   19 15

16  27   27   29 16

17  38   89   33 17

18  54   32   56 18

19  35   62   33 19

20  37    6   40 20

I use ggplot() to plot the specifity and the sensitivity as a scatterplot:

enter image description here

What I want to do :

I want to see e.g. the train and test set of datapoint 17.

I think I can do this by using the set.seed function, but I am very unfamiliar with this function.

r random

asked Nov 7 at 8:58

pineapple

486

asked Nov 7 at 8:58

pineapple

486

asked Nov 7 at 8:58

pineapple

486

asked Nov 7 at 8:58

pineapple

486

asked Nov 7 at 8:58

pineapple

486

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.

With your current code all you can do is to see that last train and test sets (number 100). That is because you keep redefining test, train, train_ind variables. The cheapest (in terms of memory) way to achieve what you want would be to somehow store train_ind from each iteration. For instance, you could use

train_inds <- list()[rep(1, 100)]

for (g in 1:100 )

{

  smp_size <- floor(0.8 * nrow(mydata1))

  train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_inds[[g]], ]

  test <- mydata1[-train_ind[[g]], ]

  # The rest

}

and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.

Lastly, set.seed isn't really going to help here. If all you were doing was running rnorm(1) hundred times, then yes, by using set.seed you could quickly recover the n-th generated value later. In your case, however, you are not only using sample for train_ind; the model estimation functions are also very likely generating random values.

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

@pineapple, does it answer your question?
– Julius Vainora
Nov 7 at 15:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53186195%2fhow-can-i-look-at-a-specific-generated-train-and-test-sets-made-from-for-loop%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.

train_inds <- list()[rep(1, 100)]

for (g in 1:100 )

{

  smp_size <- floor(0.8 * nrow(mydata1))

  train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_inds[[g]], ]

  test <- mydata1[-train_ind[[g]], ]

  # The rest

}

and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

@pineapple, does it answer your question?
– Julius Vainora
Nov 7 at 15:31

add a comment |

up vote
0
down vote

First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.

train_inds <- list()[rep(1, 100)]

for (g in 1:100 )

{

  smp_size <- floor(0.8 * nrow(mydata1))

  train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_inds[[g]], ]

  test <- mydata1[-train_ind[[g]], ]

  # The rest

}

and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

@pineapple, does it answer your question?
– Julius Vainora
Nov 7 at 15:31

add a comment |

up vote
0
down vote

First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.

train_inds <- list()[rep(1, 100)]

for (g in 1:100 )

{

  smp_size <- floor(0.8 * nrow(mydata1))

  train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_inds[[g]], ]

  test <- mydata1[-train_ind[[g]], ]

  # The rest

}

and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.

train_inds <- list()[rep(1, 100)]

for (g in 1:100 )

{

  smp_size <- floor(0.8 * nrow(mydata1))

  train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)

  train <- mydata1[train_inds[[g]], ]

  test <- mydata1[-train_ind[[g]], ]

  # The rest

}

and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

answered Nov 7 at 10:42

Julius Vainora

26.5k75877

@pineapple, does it answer your question?
– Julius Vainora
Nov 7 at 15:31

add a comment |

@pineapple, does it answer your question?
– Julius Vainora
Nov 7 at 15:31

@pineapple, does it answer your question?
– Julius Vainora
Nov 7 at 15:31

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk