How to create a time threshold based column given a time gap?
up vote
1
down vote
favorite
I have a pandas dataframe with several columns, however for visual purposes consider the columns Id
and timestamp
. As you can see the pandas dataframe is sorted by Id
column.
Id timestamp
11 2018-10-19 13:00:00
11 2018-10-19 13:05:00
11 2018-10-19 13:06:00
11 2018-10-19 13:07:00
11 2018-10-19 13:30:00
11 2018-10-19 13:31:00
11 2018-10-19 13:32:00
11 2018-10-19 13:55:00
11 2018-10-19 13:54:00
11 2018-10-21 20:47:09
11 2018-10-21 20:48:27
11 2018-10-21 20:48:45
11 2018-10-21 20:48:52
12 2018-10-09 20:30:46
12 2018-10-09 20:30:22
12 2018-10-09 20:30:05
12 2018-10-09 20:29:44
12 2018-10-09 20:29:31
13 2018-10-19 18:49:08
13 2018-10-19 18:49:13
13 2018-10-11 18:46:15
14 2018-10-11 10:46:40
14 2018-10-23 10:39:52
How can create create another ID
column based on 10 minutes time gaps? That is for every timestamp 10 minutes threshold create a new different `ID_2:
Id timestamp ID_2
11 2018-10-19 13:00:00 01
11 2018-10-19 13:05:00 01
11 2018-10-19 13:06:00 01
11 2018-10-19 13:07:00 01
11 2018-10-19 13:30:00 02
11 2018-10-19 13:31:00 02
11 2018-10-19 13:32:00 02
11 2018-10-19 13:55:00 03
11 2018-10-19 13:54:00 03
11 2018-10-21 20:47:09 04
11 2018-10-21 20:48:27 04
11 2018-10-21 20:48:45 04
11 2018-10-21 20:48:52 04
12 2018-10-09 20:30:46 04
12 2018-10-09 20:30:22 04
12 2018-10-09 20:30:05 04
12 2018-10-09 20:29:44 05
12 2018-10-09 20:29:31 05
13 2018-10-19 18:49:08 06
13 2018-10-19 18:49:13 06
13 2018-10-11 18:46:15 07
14 2018-10-11 10:46:40 07
I tried to detect the time gaps as follows:
df['col_new'] = (df['timestamp'].diff()).dt.seconds > 600
However, I do not understand how to apply a backward fill in order to create the IDs. Therefore, how can I detect time gaps and assign them a new id?
python python-3.x pandas datetime
add a comment |
up vote
1
down vote
favorite
I have a pandas dataframe with several columns, however for visual purposes consider the columns Id
and timestamp
. As you can see the pandas dataframe is sorted by Id
column.
Id timestamp
11 2018-10-19 13:00:00
11 2018-10-19 13:05:00
11 2018-10-19 13:06:00
11 2018-10-19 13:07:00
11 2018-10-19 13:30:00
11 2018-10-19 13:31:00
11 2018-10-19 13:32:00
11 2018-10-19 13:55:00
11 2018-10-19 13:54:00
11 2018-10-21 20:47:09
11 2018-10-21 20:48:27
11 2018-10-21 20:48:45
11 2018-10-21 20:48:52
12 2018-10-09 20:30:46
12 2018-10-09 20:30:22
12 2018-10-09 20:30:05
12 2018-10-09 20:29:44
12 2018-10-09 20:29:31
13 2018-10-19 18:49:08
13 2018-10-19 18:49:13
13 2018-10-11 18:46:15
14 2018-10-11 10:46:40
14 2018-10-23 10:39:52
How can create create another ID
column based on 10 minutes time gaps? That is for every timestamp 10 minutes threshold create a new different `ID_2:
Id timestamp ID_2
11 2018-10-19 13:00:00 01
11 2018-10-19 13:05:00 01
11 2018-10-19 13:06:00 01
11 2018-10-19 13:07:00 01
11 2018-10-19 13:30:00 02
11 2018-10-19 13:31:00 02
11 2018-10-19 13:32:00 02
11 2018-10-19 13:55:00 03
11 2018-10-19 13:54:00 03
11 2018-10-21 20:47:09 04
11 2018-10-21 20:48:27 04
11 2018-10-21 20:48:45 04
11 2018-10-21 20:48:52 04
12 2018-10-09 20:30:46 04
12 2018-10-09 20:30:22 04
12 2018-10-09 20:30:05 04
12 2018-10-09 20:29:44 05
12 2018-10-09 20:29:31 05
13 2018-10-19 18:49:08 06
13 2018-10-19 18:49:13 06
13 2018-10-11 18:46:15 07
14 2018-10-11 10:46:40 07
I tried to detect the time gaps as follows:
df['col_new'] = (df['timestamp'].diff()).dt.seconds > 600
However, I do not understand how to apply a backward fill in order to create the IDs. Therefore, how can I detect time gaps and assign them a new id?
python python-3.x pandas datetime
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a pandas dataframe with several columns, however for visual purposes consider the columns Id
and timestamp
. As you can see the pandas dataframe is sorted by Id
column.
Id timestamp
11 2018-10-19 13:00:00
11 2018-10-19 13:05:00
11 2018-10-19 13:06:00
11 2018-10-19 13:07:00
11 2018-10-19 13:30:00
11 2018-10-19 13:31:00
11 2018-10-19 13:32:00
11 2018-10-19 13:55:00
11 2018-10-19 13:54:00
11 2018-10-21 20:47:09
11 2018-10-21 20:48:27
11 2018-10-21 20:48:45
11 2018-10-21 20:48:52
12 2018-10-09 20:30:46
12 2018-10-09 20:30:22
12 2018-10-09 20:30:05
12 2018-10-09 20:29:44
12 2018-10-09 20:29:31
13 2018-10-19 18:49:08
13 2018-10-19 18:49:13
13 2018-10-11 18:46:15
14 2018-10-11 10:46:40
14 2018-10-23 10:39:52
How can create create another ID
column based on 10 minutes time gaps? That is for every timestamp 10 minutes threshold create a new different `ID_2:
Id timestamp ID_2
11 2018-10-19 13:00:00 01
11 2018-10-19 13:05:00 01
11 2018-10-19 13:06:00 01
11 2018-10-19 13:07:00 01
11 2018-10-19 13:30:00 02
11 2018-10-19 13:31:00 02
11 2018-10-19 13:32:00 02
11 2018-10-19 13:55:00 03
11 2018-10-19 13:54:00 03
11 2018-10-21 20:47:09 04
11 2018-10-21 20:48:27 04
11 2018-10-21 20:48:45 04
11 2018-10-21 20:48:52 04
12 2018-10-09 20:30:46 04
12 2018-10-09 20:30:22 04
12 2018-10-09 20:30:05 04
12 2018-10-09 20:29:44 05
12 2018-10-09 20:29:31 05
13 2018-10-19 18:49:08 06
13 2018-10-19 18:49:13 06
13 2018-10-11 18:46:15 07
14 2018-10-11 10:46:40 07
I tried to detect the time gaps as follows:
df['col_new'] = (df['timestamp'].diff()).dt.seconds > 600
However, I do not understand how to apply a backward fill in order to create the IDs. Therefore, how can I detect time gaps and assign them a new id?
python python-3.x pandas datetime
I have a pandas dataframe with several columns, however for visual purposes consider the columns Id
and timestamp
. As you can see the pandas dataframe is sorted by Id
column.
Id timestamp
11 2018-10-19 13:00:00
11 2018-10-19 13:05:00
11 2018-10-19 13:06:00
11 2018-10-19 13:07:00
11 2018-10-19 13:30:00
11 2018-10-19 13:31:00
11 2018-10-19 13:32:00
11 2018-10-19 13:55:00
11 2018-10-19 13:54:00
11 2018-10-21 20:47:09
11 2018-10-21 20:48:27
11 2018-10-21 20:48:45
11 2018-10-21 20:48:52
12 2018-10-09 20:30:46
12 2018-10-09 20:30:22
12 2018-10-09 20:30:05
12 2018-10-09 20:29:44
12 2018-10-09 20:29:31
13 2018-10-19 18:49:08
13 2018-10-19 18:49:13
13 2018-10-11 18:46:15
14 2018-10-11 10:46:40
14 2018-10-23 10:39:52
How can create create another ID
column based on 10 minutes time gaps? That is for every timestamp 10 minutes threshold create a new different `ID_2:
Id timestamp ID_2
11 2018-10-19 13:00:00 01
11 2018-10-19 13:05:00 01
11 2018-10-19 13:06:00 01
11 2018-10-19 13:07:00 01
11 2018-10-19 13:30:00 02
11 2018-10-19 13:31:00 02
11 2018-10-19 13:32:00 02
11 2018-10-19 13:55:00 03
11 2018-10-19 13:54:00 03
11 2018-10-21 20:47:09 04
11 2018-10-21 20:48:27 04
11 2018-10-21 20:48:45 04
11 2018-10-21 20:48:52 04
12 2018-10-09 20:30:46 04
12 2018-10-09 20:30:22 04
12 2018-10-09 20:30:05 04
12 2018-10-09 20:29:44 05
12 2018-10-09 20:29:31 05
13 2018-10-19 18:49:08 06
13 2018-10-19 18:49:13 06
13 2018-10-11 18:46:15 07
14 2018-10-11 10:46:40 07
I tried to detect the time gaps as follows:
df['col_new'] = (df['timestamp'].diff()).dt.seconds > 600
However, I do not understand how to apply a backward fill in order to create the IDs. Therefore, how can I detect time gaps and assign them a new id?
python python-3.x pandas datetime
python python-3.x pandas datetime
asked Nov 7 at 9:17
anon
1457
1457
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
I believe you need floor
with factorize
, last add zfill
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
a = pd.factorize(df['timestamp'].dt.floor('10Min'))[0] + 1
df['col_new'] = pd.Series(a, index=df.index).astype(str).str.zfill(2)
print (df)
Id timestamp ID_2 col_new
0 11 2018-10-19 13:00:00 01 01
1 11 2018-10-19 13:05:00 01 01
2 11 2018-10-19 13:06:00 01 01
3 11 2018-10-19 13:07:00 01 01
4 11 2018-10-19 13:30:00 02 02
5 11 2018-10-19 13:31:00 02 02
6 11 2018-10-19 13:32:00 02 02
7 11 2018-10-19 13:55:00 03 03
8 11 2018-10-19 13:54:00 03 03
9 11 2018-10-21 20:47:09 04 04
10 11 2018-10-21 20:48:27 04 04
11 11 2018-10-21 20:48:45 04 04
12 11 2018-10-21 20:48:52 04 04
13 12 2018-10-09 20:30:46 04 05
14 12 2018-10-09 20:30:22 04 05
15 12 2018-10-09 20:30:05 04 05
16 12 2018-10-09 20:29:44 05 06
17 12 2018-10-09 20:29:31 05 06
18 13 2018-10-19 18:49:08 06 07
19 13 2018-10-19 18:49:13 06 07
20 13 2018-10-11 18:46:15 07 08
21 14 2018-10-11 18:46:40 07 08
Detail:
print (df['timestamp'].dt.floor('10Min'))
0 2018-10-19 13:00:00
1 2018-10-19 13:00:00
2 2018-10-19 13:00:00
3 2018-10-19 13:00:00
4 2018-10-19 13:30:00
5 2018-10-19 13:30:00
6 2018-10-19 13:30:00
7 2018-10-19 13:50:00
8 2018-10-19 13:50:00
9 2018-10-21 20:40:00
10 2018-10-21 20:40:00
11 2018-10-21 20:40:00
12 2018-10-21 20:40:00
13 2018-10-09 20:30:00
14 2018-10-09 20:30:00
15 2018-10-09 20:30:00
16 2018-10-09 20:20:00
17 2018-10-09 20:20:00
18 2018-10-19 18:40:00
19 2018-10-19 18:40:00
20 2018-10-11 18:40:00
21 2018-10-11 18:40:00
Name: timestamp, dtype: datetime64[ns]
2
wow, this is superbly elegant
– Pankaj Joshi
Nov 7 at 9:48
1
@PankajJoshi - Thank you.
– jezrael
Nov 7 at 9:48
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
I believe you need floor
with factorize
, last add zfill
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
a = pd.factorize(df['timestamp'].dt.floor('10Min'))[0] + 1
df['col_new'] = pd.Series(a, index=df.index).astype(str).str.zfill(2)
print (df)
Id timestamp ID_2 col_new
0 11 2018-10-19 13:00:00 01 01
1 11 2018-10-19 13:05:00 01 01
2 11 2018-10-19 13:06:00 01 01
3 11 2018-10-19 13:07:00 01 01
4 11 2018-10-19 13:30:00 02 02
5 11 2018-10-19 13:31:00 02 02
6 11 2018-10-19 13:32:00 02 02
7 11 2018-10-19 13:55:00 03 03
8 11 2018-10-19 13:54:00 03 03
9 11 2018-10-21 20:47:09 04 04
10 11 2018-10-21 20:48:27 04 04
11 11 2018-10-21 20:48:45 04 04
12 11 2018-10-21 20:48:52 04 04
13 12 2018-10-09 20:30:46 04 05
14 12 2018-10-09 20:30:22 04 05
15 12 2018-10-09 20:30:05 04 05
16 12 2018-10-09 20:29:44 05 06
17 12 2018-10-09 20:29:31 05 06
18 13 2018-10-19 18:49:08 06 07
19 13 2018-10-19 18:49:13 06 07
20 13 2018-10-11 18:46:15 07 08
21 14 2018-10-11 18:46:40 07 08
Detail:
print (df['timestamp'].dt.floor('10Min'))
0 2018-10-19 13:00:00
1 2018-10-19 13:00:00
2 2018-10-19 13:00:00
3 2018-10-19 13:00:00
4 2018-10-19 13:30:00
5 2018-10-19 13:30:00
6 2018-10-19 13:30:00
7 2018-10-19 13:50:00
8 2018-10-19 13:50:00
9 2018-10-21 20:40:00
10 2018-10-21 20:40:00
11 2018-10-21 20:40:00
12 2018-10-21 20:40:00
13 2018-10-09 20:30:00
14 2018-10-09 20:30:00
15 2018-10-09 20:30:00
16 2018-10-09 20:20:00
17 2018-10-09 20:20:00
18 2018-10-19 18:40:00
19 2018-10-19 18:40:00
20 2018-10-11 18:40:00
21 2018-10-11 18:40:00
Name: timestamp, dtype: datetime64[ns]
2
wow, this is superbly elegant
– Pankaj Joshi
Nov 7 at 9:48
1
@PankajJoshi - Thank you.
– jezrael
Nov 7 at 9:48
add a comment |
up vote
3
down vote
accepted
I believe you need floor
with factorize
, last add zfill
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
a = pd.factorize(df['timestamp'].dt.floor('10Min'))[0] + 1
df['col_new'] = pd.Series(a, index=df.index).astype(str).str.zfill(2)
print (df)
Id timestamp ID_2 col_new
0 11 2018-10-19 13:00:00 01 01
1 11 2018-10-19 13:05:00 01 01
2 11 2018-10-19 13:06:00 01 01
3 11 2018-10-19 13:07:00 01 01
4 11 2018-10-19 13:30:00 02 02
5 11 2018-10-19 13:31:00 02 02
6 11 2018-10-19 13:32:00 02 02
7 11 2018-10-19 13:55:00 03 03
8 11 2018-10-19 13:54:00 03 03
9 11 2018-10-21 20:47:09 04 04
10 11 2018-10-21 20:48:27 04 04
11 11 2018-10-21 20:48:45 04 04
12 11 2018-10-21 20:48:52 04 04
13 12 2018-10-09 20:30:46 04 05
14 12 2018-10-09 20:30:22 04 05
15 12 2018-10-09 20:30:05 04 05
16 12 2018-10-09 20:29:44 05 06
17 12 2018-10-09 20:29:31 05 06
18 13 2018-10-19 18:49:08 06 07
19 13 2018-10-19 18:49:13 06 07
20 13 2018-10-11 18:46:15 07 08
21 14 2018-10-11 18:46:40 07 08
Detail:
print (df['timestamp'].dt.floor('10Min'))
0 2018-10-19 13:00:00
1 2018-10-19 13:00:00
2 2018-10-19 13:00:00
3 2018-10-19 13:00:00
4 2018-10-19 13:30:00
5 2018-10-19 13:30:00
6 2018-10-19 13:30:00
7 2018-10-19 13:50:00
8 2018-10-19 13:50:00
9 2018-10-21 20:40:00
10 2018-10-21 20:40:00
11 2018-10-21 20:40:00
12 2018-10-21 20:40:00
13 2018-10-09 20:30:00
14 2018-10-09 20:30:00
15 2018-10-09 20:30:00
16 2018-10-09 20:20:00
17 2018-10-09 20:20:00
18 2018-10-19 18:40:00
19 2018-10-19 18:40:00
20 2018-10-11 18:40:00
21 2018-10-11 18:40:00
Name: timestamp, dtype: datetime64[ns]
2
wow, this is superbly elegant
– Pankaj Joshi
Nov 7 at 9:48
1
@PankajJoshi - Thank you.
– jezrael
Nov 7 at 9:48
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
I believe you need floor
with factorize
, last add zfill
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
a = pd.factorize(df['timestamp'].dt.floor('10Min'))[0] + 1
df['col_new'] = pd.Series(a, index=df.index).astype(str).str.zfill(2)
print (df)
Id timestamp ID_2 col_new
0 11 2018-10-19 13:00:00 01 01
1 11 2018-10-19 13:05:00 01 01
2 11 2018-10-19 13:06:00 01 01
3 11 2018-10-19 13:07:00 01 01
4 11 2018-10-19 13:30:00 02 02
5 11 2018-10-19 13:31:00 02 02
6 11 2018-10-19 13:32:00 02 02
7 11 2018-10-19 13:55:00 03 03
8 11 2018-10-19 13:54:00 03 03
9 11 2018-10-21 20:47:09 04 04
10 11 2018-10-21 20:48:27 04 04
11 11 2018-10-21 20:48:45 04 04
12 11 2018-10-21 20:48:52 04 04
13 12 2018-10-09 20:30:46 04 05
14 12 2018-10-09 20:30:22 04 05
15 12 2018-10-09 20:30:05 04 05
16 12 2018-10-09 20:29:44 05 06
17 12 2018-10-09 20:29:31 05 06
18 13 2018-10-19 18:49:08 06 07
19 13 2018-10-19 18:49:13 06 07
20 13 2018-10-11 18:46:15 07 08
21 14 2018-10-11 18:46:40 07 08
Detail:
print (df['timestamp'].dt.floor('10Min'))
0 2018-10-19 13:00:00
1 2018-10-19 13:00:00
2 2018-10-19 13:00:00
3 2018-10-19 13:00:00
4 2018-10-19 13:30:00
5 2018-10-19 13:30:00
6 2018-10-19 13:30:00
7 2018-10-19 13:50:00
8 2018-10-19 13:50:00
9 2018-10-21 20:40:00
10 2018-10-21 20:40:00
11 2018-10-21 20:40:00
12 2018-10-21 20:40:00
13 2018-10-09 20:30:00
14 2018-10-09 20:30:00
15 2018-10-09 20:30:00
16 2018-10-09 20:20:00
17 2018-10-09 20:20:00
18 2018-10-19 18:40:00
19 2018-10-19 18:40:00
20 2018-10-11 18:40:00
21 2018-10-11 18:40:00
Name: timestamp, dtype: datetime64[ns]
I believe you need floor
with factorize
, last add zfill
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
a = pd.factorize(df['timestamp'].dt.floor('10Min'))[0] + 1
df['col_new'] = pd.Series(a, index=df.index).astype(str).str.zfill(2)
print (df)
Id timestamp ID_2 col_new
0 11 2018-10-19 13:00:00 01 01
1 11 2018-10-19 13:05:00 01 01
2 11 2018-10-19 13:06:00 01 01
3 11 2018-10-19 13:07:00 01 01
4 11 2018-10-19 13:30:00 02 02
5 11 2018-10-19 13:31:00 02 02
6 11 2018-10-19 13:32:00 02 02
7 11 2018-10-19 13:55:00 03 03
8 11 2018-10-19 13:54:00 03 03
9 11 2018-10-21 20:47:09 04 04
10 11 2018-10-21 20:48:27 04 04
11 11 2018-10-21 20:48:45 04 04
12 11 2018-10-21 20:48:52 04 04
13 12 2018-10-09 20:30:46 04 05
14 12 2018-10-09 20:30:22 04 05
15 12 2018-10-09 20:30:05 04 05
16 12 2018-10-09 20:29:44 05 06
17 12 2018-10-09 20:29:31 05 06
18 13 2018-10-19 18:49:08 06 07
19 13 2018-10-19 18:49:13 06 07
20 13 2018-10-11 18:46:15 07 08
21 14 2018-10-11 18:46:40 07 08
Detail:
print (df['timestamp'].dt.floor('10Min'))
0 2018-10-19 13:00:00
1 2018-10-19 13:00:00
2 2018-10-19 13:00:00
3 2018-10-19 13:00:00
4 2018-10-19 13:30:00
5 2018-10-19 13:30:00
6 2018-10-19 13:30:00
7 2018-10-19 13:50:00
8 2018-10-19 13:50:00
9 2018-10-21 20:40:00
10 2018-10-21 20:40:00
11 2018-10-21 20:40:00
12 2018-10-21 20:40:00
13 2018-10-09 20:30:00
14 2018-10-09 20:30:00
15 2018-10-09 20:30:00
16 2018-10-09 20:20:00
17 2018-10-09 20:20:00
18 2018-10-19 18:40:00
19 2018-10-19 18:40:00
20 2018-10-11 18:40:00
21 2018-10-11 18:40:00
Name: timestamp, dtype: datetime64[ns]
edited Nov 7 at 9:49
answered Nov 7 at 9:36
jezrael
306k20240316
306k20240316
2
wow, this is superbly elegant
– Pankaj Joshi
Nov 7 at 9:48
1
@PankajJoshi - Thank you.
– jezrael
Nov 7 at 9:48
add a comment |
2
wow, this is superbly elegant
– Pankaj Joshi
Nov 7 at 9:48
1
@PankajJoshi - Thank you.
– jezrael
Nov 7 at 9:48
2
2
wow, this is superbly elegant
– Pankaj Joshi
Nov 7 at 9:48
wow, this is superbly elegant
– Pankaj Joshi
Nov 7 at 9:48
1
1
@PankajJoshi - Thank you.
– jezrael
Nov 7 at 9:48
@PankajJoshi - Thank you.
– jezrael
Nov 7 at 9:48
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53186476%2fhow-to-create-a-time-threshold-based-column-given-a-time-gap%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown