Difference in variable between one obervation for a subject and the next (R)
In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.
That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:
Example dataset:
game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330
I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.
It should look like this:
game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.
I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".
game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)
r dplyr transform panel difference
add a comment |
In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.
That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:
Example dataset:
game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330
I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.
It should look like this:
game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.
I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".
game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)
r dplyr transform panel difference
1
look at the functionslag
andlead
as well asspread
andgather
.
– Aramis7d
Nov 22 '18 at 13:29
well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point.game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))
– Andre Elrico
Nov 22 '18 at 13:33
your code uses the column nametype
for what your example says is calledactivity
. Just pointing out.
– iod
Nov 22 '18 at 14:01
add a comment |
In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.
That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:
Example dataset:
game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330
I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.
It should look like this:
game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.
I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".
game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)
r dplyr transform panel difference
In my panel dataset, I don't have the time needed for a specific activity, but only the time of starting an activity. That's why I need to make a sum out of the differences between the obervation after an activity and the actual observation.
That's why I now want to create a new variable in my panel dataset that specifies the difference in a variable between one obervation and the next. It gets clearer with an example dataset:
Example dataset:
game_data <- data.frame(player = c(1,1,1,1,2,2,2,2), level = c(1,1,2,2,1,1,2,2), activity = c("run","run","run","swim","swim","run","run","swim"), datesec = c(0,150,170,240,100,110,180,330))
> game_data
player level activity datesec
1 1 1 run 0
2 1 1 run 150
3 1 2 run 170
4 1 2 swim 240
5 2 1 swim 100
6 2 1 run 110
7 2 2 run 180
8 2 2 swim 330
I now want to add a new variable for the sum of the time in seconds after each activity "run" to the next observation (It doesn't matter what the next activity is, whether "swim" or "run") and even if the next activity is in the next level, it should take the first activity of the next level to build the difference. I also only want to have one observation for each level for each user.
It should look like this:
game_data_new <- data.frame(player = c(1,1,2,2), level = c(1,2,1,2), n_run = c(2,1,1,1), n_swim = c(0,1,1,1), timeafterrun = c(170,70,70,150))
> game_data_new
player level n_run n_swim timeafterrun
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
The 170 in the variable "timeafterrun" is for example computed by (150-0) + (170-150)
Here, the code has to take the first observation of the next level,, level 2, because there is no further activity in level 1.
I've tried the following, but I don't know what to add to code to tell R that it should take the difference in datesec between the next obervation after "run" (even it's in the next level) and the actual "run".
game <- game %>%
group_by(player,level) %>%
summarize(
n_run = sum(type == "run"),
n_swim = sum(type == "swim"),
timeafterrun = datesec(datesec of activity after_last_"run"-obervation) - datesec(actual_"run"-observation)
)
r dplyr transform panel difference
r dplyr transform panel difference
edited Nov 22 '18 at 13:29
Scijens
asked Nov 22 '18 at 12:58
ScijensScijens
456
456
1
look at the functionslag
andlead
as well asspread
andgather
.
– Aramis7d
Nov 22 '18 at 13:29
well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point.game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))
– Andre Elrico
Nov 22 '18 at 13:33
your code uses the column nametype
for what your example says is calledactivity
. Just pointing out.
– iod
Nov 22 '18 at 14:01
add a comment |
1
look at the functionslag
andlead
as well asspread
andgather
.
– Aramis7d
Nov 22 '18 at 13:29
well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point.game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))
– Andre Elrico
Nov 22 '18 at 13:33
your code uses the column nametype
for what your example says is calledactivity
. Just pointing out.
– iod
Nov 22 '18 at 14:01
1
1
look at the functions
lag
and lead
as well as spread
and gather
.– Aramis7d
Nov 22 '18 at 13:29
look at the functions
lag
and lead
as well as spread
and gather
.– Aramis7d
Nov 22 '18 at 13:29
well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point.
game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))
– Andre Elrico
Nov 22 '18 at 13:33
well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point.
game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))
– Andre Elrico
Nov 22 '18 at 13:33
your code uses the column name
type
for what your example says is called activity
. Just pointing out.– iod
Nov 22 '18 at 14:01
your code uses the column name
type
for what your example says is called activity
. Just pointing out.– iod
Nov 22 '18 at 14:01
add a comment |
1 Answer
1
active
oldest
votes
require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))
# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
Here's what's going on: first, I create a helper column with the next (lead
) datesec for each row (within an individual player).
Next I group_by
both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run"
from the largest datesec in the group.
Then I summarize
to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max
, but it doesn't really matter.
Thank you, this helped a lot!
– Scijens
Nov 26 '18 at 12:56
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431588%2fdifference-in-variable-between-one-obervation-for-a-subject-and-the-next-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))
# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
Here's what's going on: first, I create a helper column with the next (lead
) datesec for each row (within an individual player).
Next I group_by
both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run"
from the largest datesec in the group.
Then I summarize
to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max
, but it doesn't really matter.
Thank you, this helped a lot!
– Scijens
Nov 26 '18 at 12:56
add a comment |
require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))
# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
Here's what's going on: first, I create a helper column with the next (lead
) datesec for each row (within an individual player).
Next I group_by
both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run"
from the largest datesec in the group.
Then I summarize
to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max
, but it doesn't really matter.
Thank you, this helped a lot!
– Scijens
Nov 26 '18 at 12:56
add a comment |
require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))
# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
Here's what's going on: first, I create a helper column with the next (lead
) datesec for each row (within an individual player).
Next I group_by
both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run"
from the largest datesec in the group.
Then I summarize
to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max
, but it doesn't really matter.
require(dplyr)
game_data %>%
group_by(player) %>%
mutate(nextdat=lead(datesec)) %>%
group_by(level, add=TRUE) %>%
mutate(timeafterrun=max(nextdat,na.rm=TRUE)-min(datesec[activity=="run"],na.rm = TRUE)) %>%
summarize(n_run=sum(activity=="run"),n_swim=sum(activity=="swim"), timeafterrun=max(timeafterrun))
# A tibble: 4 x 5
# Groups: player [?]
player level n_run n_swim timeafterrun
<dbl> <dbl> <int> <int> <dbl>
1 1 1 2 0 170
2 1 2 1 1 70
3 2 1 1 1 70
4 2 2 1 1 150
Here's what's going on: first, I create a helper column with the next (lead
) datesec for each row (within an individual player).
Next I group_by
both player and level, and create a column that subtracts the smallest datesec for lines with activity=="run"
from the largest datesec in the group.
Then I summarize
to create n_run and n_swim, and copy over timeafterrun, which should be the same for the entire group, so I arbitrarily picked max
, but it doesn't really matter.
answered Nov 22 '18 at 13:55
iodiod
4,2172723
4,2172723
Thank you, this helped a lot!
– Scijens
Nov 26 '18 at 12:56
add a comment |
Thank you, this helped a lot!
– Scijens
Nov 26 '18 at 12:56
Thank you, this helped a lot!
– Scijens
Nov 26 '18 at 12:56
Thank you, this helped a lot!
– Scijens
Nov 26 '18 at 12:56
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431588%2fdifference-in-variable-between-one-obervation-for-a-subject-and-the-next-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
look at the functions
lag
andlead
as well asspread
andgather
.– Aramis7d
Nov 22 '18 at 13:29
well, your question is unclear, you are trying to archive too many things at once. Try to break your problem into n steps. I believe this might be the first starting point.
game_data %>% group_by(player) %>% dplyr::mutate(diff_with_before_val = datesec - lag(datesec, default = 0))
– Andre Elrico
Nov 22 '18 at 13:33
your code uses the column name
type
for what your example says is calledactivity
. Just pointing out.– iod
Nov 22 '18 at 14:01