Cross and Auto-Correlation
Summaries of relationships between and within time series.
Kris Sankaran (UW Madison)
01-10-2023
Reading, Recording, Rmarkdown
- There is often interest in seeing how two time series relate to one another.
A scatterplot can be useful in gauging this relationship. For example, the plot
below suggests that there is a relationship between electricity demand and
temperature, but it’s hard to make out exactly the nature of the relationship.
A scatterplot clarifies that, while electricity demand generally goes up in the
cooler months, the very highest demand happens during high heat days.
- Note that the timepoints are far from independent. Points tend to drift
gradually across the scatterplot, rather than jumping to completely different
regions in short time intervals. This is just the 2D consequence of the time
series varying smoothly.
lagged <- vic_2014[c(2:nrow(vic_2014), 2), ] %>%
setNames(str_c("lagged_", colnames(vic_2014)))
ggplot(bind_cols(vic_2014, lagged), aes(x = Temperature, y = Demand)) +
geom_point(alpha = 0.6, size = 0.7) +
geom_segment(
aes(xend = lagged_Temperature, yend = lagged_Demand),
size = .4, alpha = 0.5
)
- To formally measure the linear relationship between two time series, we can use the cross-correlation,
\[\begin{align}
\frac{\sum_{t}\left(x_{t} - \hat{\mu}_{X}\right)\left(y_{t} - \hat{\mu}_{Y}\right)}{\hat{\sigma}_{X}\hat{\sigma}_{Y}}
\end{align}\]
which for the data above is,
cor(vic_2014$Temperature, vic_2014$Demand)
[1] 0.2797854
- Cross-correlation can be extended to autocorrelation — the correlation
between a time series and a lagged version of itself. This measure is useful for
quantifying the strength of seasonality within a time series. A daily time
series with strong weekly seasonality will have high autocorrelation at lag 7,
for example. The example below shows the lag-plots for Australian beer
production after 2000. The plot makes clear that there is high autocorrelation
at lags 4 and 8, suggesting high quarterly seasonality.
recent_production <- aus_production %>%
filter(year(Quarter) > 2000)
gg_lag(recent_production, Beer, geom = "point")
Indeed, we can confirm this by looking at the original data.
- These lag plots take up a bit of space. A more compact summary is to compute
the autocorrelation function (ACF). Peaks and valleys in an ACF suggest
seasonality at the frequency indicated by the lag value.
acf_data <- ACF(recent_production, Beer)
autoplot(acf_data)
- Gradually decreasing slopes in the ACF suggest trends. This is because if
there is a trend, the current value tends to be very correlated with the recent
past. It’s possible to have both seasonality within a trend, in which case the
ACF function has bumps where the seasonal peaks align.
Citation
For attribution, please cite this work as
Sankaran (2023, Jan. 10). STAT 436 (Spring 2023): Cross and Auto-Correlation. Retrieved from https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week06-04/
BibTeX citation
@misc{sankaran2023cross,
author = {Sankaran, Kris},
title = {STAT 436 (Spring 2023): Cross and Auto-Correlation},
url = {https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week06-04/},
year = {2023}
}