Difference in variable between one obervation for a subject and the next (R)












1















In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.



That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:



Example dataset:



game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330


I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.



It should look like this:



game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.



I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".



game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)









share|improve this question




















  • 1





    look at the functions lag and lead as well as spread and gather.

    – Aramis7d
    Nov 22 '18 at 13:29











  • well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point. game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))

    – Andre Elrico
    Nov 22 '18 at 13:33











  • your code uses the column name type for what your example says is called activity. Just pointing out.

    – iod
    Nov 22 '18 at 14:01
















1















In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.



That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:



Example dataset:



game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330


I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.



It should look like this:



game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.



I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".



game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)









share|improve this question




















  • 1





    look at the functions lag and lead as well as spread and gather.

    – Aramis7d
    Nov 22 '18 at 13:29











  • well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point. game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))

    – Andre Elrico
    Nov 22 '18 at 13:33











  • your code uses the column name type for what your example says is called activity. Just pointing out.

    – iod
    Nov 22 '18 at 14:01














1












1








1


1






In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.



That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:



Example dataset:



game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330


I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.



It should look like this:



game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.



I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".



game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)









share|improve this question
















In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.



That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:



Example dataset:



game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330


I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.



It should look like this:



game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.



I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".



game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)






r dplyr transform panel difference






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 13:29







Scijens

















asked Nov 22 '18 at 12:58









ScijensScijens

456




456








  • 1





    look at the functions lag and lead as well as spread and gather.

    – Aramis7d
    Nov 22 '18 at 13:29











  • well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point. game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))

    – Andre Elrico
    Nov 22 '18 at 13:33











  • your code uses the column name type for what your example says is called activity. Just pointing out.

    – iod
    Nov 22 '18 at 14:01














  • 1





    look at the functions lag and lead as well as spread and gather.

    – Aramis7d
    Nov 22 '18 at 13:29











  • well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point. game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))

    – Andre Elrico
    Nov 22 '18 at 13:33











  • your code uses the column name type for what your example says is called activity. Just pointing out.

    – iod
    Nov 22 '18 at 14:01








1




1





look at the functions lag and lead as well as spread and gather.

– Aramis7d
Nov 22 '18 at 13:29





look at the functions lag and lead as well as spread and gather.

– Aramis7d
Nov 22 '18 at 13:29













well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point. game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))

– Andre Elrico
Nov 22 '18 at 13:33





well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point. game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))

– Andre Elrico
Nov 22 '18 at 13:33













your code uses the column name type for what your example says is called activity. Just pointing out.

– iod
Nov 22 '18 at 14:01





your code uses the column name type for what your example says is called activity. Just pointing out.

– iod
Nov 22 '18 at 14:01












1 Answer
1






active

oldest

votes


















0














require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))

# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


Here's what's going on: first, I create a helper column with the next (lead) datesec for each row (within an individual player).



Next I group_by both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run" from the largest datesec in the group.



Then I summarize to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max, but it doesn't really matter.






share|improve this answer
























  • Thank you, this helped a lot!

    – Scijens
    Nov 26 '18 at 12:56











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431588%2fdifference-in-variable-between-one-obervation-for-a-subject-and-the-next-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))

# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


Here's what's going on: first, I create a helper column with the next (lead) datesec for each row (within an individual player).



Next I group_by both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run" from the largest datesec in the group.



Then I summarize to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max, but it doesn't really matter.






share|improve this answer
























  • Thank you, this helped a lot!

    – Scijens
    Nov 26 '18 at 12:56
















0














require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))

# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


Here's what's going on: first, I create a helper column with the next (lead) datesec for each row (within an individual player).



Next I group_by both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run" from the largest datesec in the group.



Then I summarize to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max, but it doesn't really matter.






share|improve this answer
























  • Thank you, this helped a lot!

    – Scijens
    Nov 26 '18 at 12:56














0












0








0







require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))

# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


Here's what's going on: first, I create a helper column with the next (lead) datesec for each row (within an individual player).



Next I group_by both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run" from the largest datesec in the group.



Then I summarize to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max, but it doesn't really matter.






share|improve this answer













require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))

# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150


Here's what's going on: first, I create a helper column with the next (lead) datesec for each row (within an individual player).



Next I group_by both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run" from the largest datesec in the group.



Then I summarize to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max, but it doesn't really matter.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 22 '18 at 13:55









iodiod

4,2172723




4,2172723













  • Thank you, this helped a lot!

    – Scijens
    Nov 26 '18 at 12:56



















  • Thank you, this helped a lot!

    – Scijens
    Nov 26 '18 at 12:56

















Thank you, this helped a lot!

– Scijens
Nov 26 '18 at 12:56





Thank you, this helped a lot!

– Scijens
Nov 26 '18 at 12:56




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431588%2fdifference-in-variable-between-one-obervation-for-a-subject-and-the-next-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini