Counts and Cumulative Sums by Group

data.table time series

In a time series, how do I add a counter and cumulative sum that resets every 30 minutes?

Jeremy Allen https://jeremydata.com
03-13-2021

Problem:

In a time series, how do I add a counter and cumulative sum that resets every 30 minutes?

I keep forgetting about R’s seq_len() function. I’m writing about it here, so when I inevitably google for it I can find it here. Let’s see how it works.

We will be using data.table for this example.

library(data.table)
library(lubridate)


#---- fake data ----

times <- seq.POSIXt(
 from = as.POSIXct("2021-03-12 08:00:00"),
 to =   as.POSIXct("2021-03-12 12:00:00"),
 by = "sec"
)

accounts <- seq(1111, 9999, by = 1000)

amounts <- seq(1, 5000, by = 1)

set.seed(865)

dt <- data.table(
 datetime = sample(times, 1500, replace = TRUE),
 account = sample(accounts, 1500, replace = TRUE),
 amount = sample(amounts, 1500, replace = TRUE)
)

# order by account and datetime
setkeyv(dt, c("account", "datetime"))

n = seq_len(.N) By Group Row Number in data.table

After making our fake time series data, we can uses seq_len(.N) along with the by argument in data.table to add a counter that resets for each 30-minute group. Notice the transformation of a variable within the by = of the data.table.

Cumulative sum by 30-minute periods

#---- counts and sums ----

# count and sum by half hour using seq_len() to show 
# each observation's position within its 30-minute group
dt[,
   `:=`(n_by_30_mins = seq_len(.N), cumulative_amount = cumsum(amount)) ,
   by = .(account, floor_date(datetime, "30 minutes"))]

head(dt, 50)
               datetime account amount n_by_30_mins cumulative_amount
 1: 2021-03-12 08:00:00    1111   3798            1              3798
 2: 2021-03-12 08:02:06    1111    303            2              4101
 3: 2021-03-12 08:03:55    1111   1620            3              5721
 4: 2021-03-12 08:04:10    1111   4926            4             10647
 5: 2021-03-12 08:04:24    1111   3112            5             13759
 6: 2021-03-12 08:07:02    1111   3515            6             17274
 7: 2021-03-12 08:08:06    1111   1767            7             19041
 8: 2021-03-12 08:09:02    1111   1411            8             20452
 9: 2021-03-12 08:12:12    1111   1082            9             21534
10: 2021-03-12 08:14:26    1111   2164           10             23698
11: 2021-03-12 08:16:49    1111   3050           11             26748
12: 2021-03-12 08:19:03    1111   4329           12             31077
13: 2021-03-12 08:21:25    1111   4462           13             35539
14: 2021-03-12 08:23:53    1111    969           14             36508
15: 2021-03-12 08:25:41    1111   4287           15             40795
16: 2021-03-12 08:31:25    1111   2323            1              2323
17: 2021-03-12 08:33:29    1111   3629            2              5952
18: 2021-03-12 08:34:25    1111   2697            3              8649
19: 2021-03-12 08:34:57    1111   2878            4             11527
20: 2021-03-12 08:35:41    1111   2921            5             14448
21: 2021-03-12 08:36:12    1111   3286            6             17734
22: 2021-03-12 08:38:18    1111   2393            7             20127
23: 2021-03-12 08:40:44    1111   2245            8             22372
24: 2021-03-12 08:43:19    1111   3969            9             26341
25: 2021-03-12 08:44:26    1111   3517           10             29858
26: 2021-03-12 08:44:28    1111    291           11             30149
27: 2021-03-12 08:45:12    1111   1545           12             31694
28: 2021-03-12 08:45:30    1111   2679           13             34373
29: 2021-03-12 08:47:30    1111    114           14             34487
30: 2021-03-12 08:48:18    1111    958           15             35445
31: 2021-03-12 08:48:45    1111   4566           16             40011
32: 2021-03-12 08:50:03    1111   3496           17             43507
33: 2021-03-12 08:50:55    1111   1823           18             45330
34: 2021-03-12 08:51:06    1111   4881           19             50211
35: 2021-03-12 08:54:06    1111     35           20             50246
36: 2021-03-12 08:55:10    1111    484           21             50730
37: 2021-03-12 08:57:55    1111   2192           22             52922
38: 2021-03-12 09:01:45    1111    461            1               461
39: 2021-03-12 09:02:30    1111   4523            2              4984
40: 2021-03-12 09:04:10    1111   2560            3              7544
41: 2021-03-12 09:05:29    1111   1884            4              9428
42: 2021-03-12 09:06:18    1111   4933            5             14361
43: 2021-03-12 09:07:30    1111    100            6             14461
44: 2021-03-12 09:07:52    1111     99            7             14560
45: 2021-03-12 09:10:21    1111   2865            8             17425
46: 2021-03-12 09:13:05    1111   2584            9             20009
47: 2021-03-12 09:14:44    1111   3649           10             23658
48: 2021-03-12 09:18:51    1111   1139           11             24797
49: 2021-03-12 09:19:13    1111   2582           12             27379
50: 2021-03-12 09:23:20    1111   2875           13             30254
               datetime account amount n_by_30_mins cumulative_amount

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Allen (2021, March 13). jeremydata: Counts and Cumulative Sums by Group. Retrieved from https://jeremydata.com/posts/2021-03-13-counts-and-cumulative-sums-by-group/

BibTeX citation

@misc{allen2021counts,
  author = {Allen, Jeremy},
  title = {jeremydata: Counts and Cumulative Sums by Group},
  url = {https://jeremydata.com/posts/2021-03-13-counts-and-cumulative-sums-by-group/},
  year = {2021}
}