A data structure for managing time series data.
Tsibbles are data structures that are designed specifically for storing time series data. They are useful because they create a unified interface to various time series visualization and modeling tasks. This removes the friction of having to transform back and forth between data.frames, lists, and matrices, depending on the particular task of interest.
The key difference between a tsibble and an ordinary data.frame is that it requires a temporal key variable, specifying the frequency with which observations are collected. For example, the code below generates a tsibble with yearly observations.
# A tsibble: 5 x 2 [1Y]
Year Observation
<int> <dbl>
1 2015 123
2 2016 39
3 2017 78
4 2018 52
5 2019 110
as_tsibble
function. The only subtlety is that we have to specify an index.x <- data.frame(
Year = 2015:2019,
Observation = c(123, 39, 78, 52, 110)
)
as_tsibble(x, index = Year)
# A tsibble: 5 x 2 [1Y]
Year Observation
<int> <dbl>
1 2015 123
2 2016 39
3 2017 78
4 2018 52
5 2019 110
days <- seq(as_date("2021-01-01"), as_date("2021-01-31"), by = "day")
days <- days[-5] # Skip January 5
x <- tsibble(day = days, value = rnorm(30), index = day)
fill_gaps(x)
# A tsibble: 31 x 2 [1D]
day value
<date> <dbl>
1 2021-01-01 -0.754
2 2021-01-02 -1.42
3 2021-01-03 -0.575
4 2021-01-04 -0.373
5 2021-01-05 NA
6 2021-01-06 -0.935
7 2021-01-07 -1.06
8 2021-01-08 1.07
9 2021-01-09 -0.527
10 2021-01-10 0.344
# … with 21 more rows
olympic_running
# A tsibble: 312 x 4 [4Y]
# Key: Length, Sex [14]
Year Length Sex Time
<int> <int> <chr> <dbl>
1 1896 100 men 12
2 1900 100 men 11
3 1904 100 men 11
4 1908 100 men 10.8
5 1912 100 men 10.8
6 1916 100 men NA
7 1920 100 men 10.8
8 1924 100 men 10.6
9 1928 100 men 10.8
10 1932 100 men 10.3
# … with 302 more rows
the keys are running distance and sex. If we were creating a tsibble from a data.frame containing these multiple time series, we would need to specify the keys. This protects against accidentally having duplicate observations at given times.
olympic_df <- as.data.frame(olympic_running)
as_tsibble(olympic_df, index = Year, key = c("Sex", "Length")) # what happens if we remove key?
# A tsibble: 312 x 4 [4Y]
# Key: Sex, Length [14]
Year Length Sex Time
<int> <int> <chr> <dbl>
1 1896 100 men 12
2 1900 100 men 11
3 1904 100 men 11
4 1908 100 men 10.8
5 1912 100 men 10.8
6 1916 100 men NA
7 1920 100 men 10.8
8 1924 100 men 10.6
9 1928 100 men 10.8
10 1932 100 men 10.3
# … with 302 more rows
The usual data tidying functions from dplyr
are implemented for tsibbles. Filtering rows, selecting columns, deriving variables using mutate
, and summarizing groups using group_by
and summarise
all work as expected. One distinction to be careful about is that the results will be grouped by their index.
For example, this computes the total cost of Australian pharmaceuticals per month for a particular type of script. We simply filter to the script type and take the sum of costs.
# A tsibble: 204 x 2 [1M]
Month TotalC
<mth> <dbl>
1 1991 Jul 3526591
2 1991 Aug 3180891
3 1991 Sep 3252221
4 1991 Oct 3611003
5 1991 Nov 3565869
6 1991 Dec 4306371
7 1992 Jan 5088335
8 1992 Feb 2814520
9 1992 Mar 2985811
10 1992 Apr 3204780
# … with 194 more rows
If we had wanted the total cost by year, we would have to convert to an ordinary data.frame with a year variable. We cannot use a tsibble here because we would have multiple measurements per year, and this would violate tsibble’s policy of having no duplicates.
PBS %>%
filter(ATC2 == "A10") %>%
mutate(Year = year(Month)) %>%
as_tibble() %>%
group_by(Year) %>%
summarise(TotalC = sum(Cost))
# A tibble: 18 x 2
Year TotalC
* <dbl> <dbl>
1 1991 21442946
2 1992 45686946.
3 1993 55532688.
4 1994 60816080.
5 1995 67326599.
6 1996 77397927.
7 1997 85131672.
8 1998 93310626.
9 1999 105959043.
10 2000 122496586.
11 2001 136467442.
12 2002 149066136.
13 2003 156464261.
14 2004 183798935.
15 2005 199655595
16 2006 220354676
17 2007 265718966.
18 2008 135036513