Autocorrelation is common in time series. You can specify N-order autoregressive models using ar(N)
in the segment formulas. The most common use case is probably just to add ar(1)
to the first segment. This will be “carried over” to later segments if nothing is done to change it - just like all other intercepts in mcp. You can do regression on the autocorrelation parameters using ar(N, formula)
and it behaves much like sigma
to model variance.
Let’s simulate some data using the mcp_example()
function with the “ar” settings:
library(mcp) options(mc.cores = 3) # Speed up sampling set.seed(42) # Make the script deterministic ex = mcp_example("ar")
## Generating residuals for AR(N) model since the response column/argument 'price' was not provided.
head(ex$data)
## # A tibble: 6 x 2
## time price
## <int> <dbl>
## 1 1 26.9
## 2 2 22.0
## 3 3 24.6
## 4 4 26.8
## 5 5 27.7
## 6 6 26.2
See how this was generated in ex$call
. We model this as a plateau (1
) with a second-order autoregressive residual (ar(2)
) followed by a joined slope (0 + time
) with a negative first-order autoregressive residual (ar(1)
):
model = list( price ~ 1 + ar(2), # int_1, ar1_1, ar2_1 ~ 0 + time + ar(1) # time_2, ar1_2 ) fit = mcp(model, ex$data)
Let’s plot it and we see that AR was strong in the first segment and weaker-but-negative in the second:
plot(fit)
We can summarise the inferred coefficients:
summary(fit)
## Family: gaussian(link = 'identity')
## Iterations: 9000 from 3 chains.
## Segments:
## 1: price ~ 1 + ar(2)
## 2: price ~ 1 ~ 0 + time + ar(1)
##
## Population-level parameters:
## name match sim mean lower upper Rhat n.eff
## ar1_1 OK 0.7 0.73 0.5665 0.89 1.0 821
## ar1_2 OK -0.4 -0.47 -0.6988 -0.26 1.0 2021
## ar2_1 OK 0.2 0.16 0.0074 0.30 1.0 684
## cp_1 OK 120.0 118.14 113.0267 125.83 1.2 35
## int_1 OK 20.0 18.08 14.6039 22.44 1.1 73
## sigma_1 OK 5.0 4.85 4.3666 5.34 1.0 3401
## time_2 OK 0.5 0.52 0.4779 0.55 1.0 777
Note that he naming syntax for autoregressive intercepts is ar[order]_[segment]
. For example, ar1_2
is the first-order autoregressive coefficient in segment 2. For slopes it will be ar[order]_[normal mcp name]
, e.g., ar1_x_3
for a slope on AR(1) in segment 3.
Comparing the columns mean
and sim
we see that the AR coefficients are reasonably recovered. In fact, the posterior mean is almost always exactly the same as arima(data, order = c(N, 0, 0))
(see below), so the non-perfect fits are due to randomness in the simulation - not in the fit.
Notice that sigma
in AR models describe innovations, i.e., the part of the residuals that are not explained by the autoregressive coefficients. sd(ex$data$price)
is always higher. In this case, the SD of raw data in the plateau is 9.2. As always, it is good to assess posteriors and convergence more directly:
plot_pars(fit)
Sometimes, the trace plot shows that the change point (cp_1
) is not well identified with this model and data. As discussed in the article on tips, tricks, and debugging, you could combine a more informative prior with more samples (mcp(..., adapt = 10000, iter = 10000)
), if this is a problem.
You can do hypothesis testing (see hypothesis()
) and model comparisons (see loo()
) with autoregressive models, just as with any other model in mcp
. Read more here or scroll down for an applied example.
The autoregressive modeling applies to the residuals from the predicted fit as is common. These residuals are computed on a transformed scale for families with a non-identity link function (e.g., poisson(link = "log")
). In time-series jargon, this is a dynamical regression model where the the “normal” regression parameters make up the deterministic structure. See further comments in the section on priors below.
If you want to control the direction of the change, you simply put a positive-only prior the corresponding ar slope coefficient, e.g., ar_x_1 = "dnorm(0, 1) T(0, )
. See recommendations in the section on priors for AR(N).
While the typical usage is AR(N)
, you can also specify how autocorrelation itself changes with \(x\) using ar(N, formula)
. If you want to model a steady change in the AR(1) “strength”, you can do ar(1, 1 + time)
. Please note that the link function for this regression formula is identity, so you can easily exceed the stationary boundary [-1, 1]. See the section on priors for AR(N) how to counter such problems. The regression on the AR coefficients is applied to all orders. Raise an issue on GitHub if you see a use case for per-order control.
You have the full suite of mcp
formula syntax available inside ar
so you could easily do ar(3, rel(1) + I(x^2) + exp(x))
. Make sure to use a positive-only prior (e.g., "dunif(0, 1)"
if using log()
and sqrt()
since they fail for negative values.
You can combine ar()
with any regression model and with varying change points. Combining with sigma()
will run will be valid if they change intercepts together so that each segment has homoskedastic residuals, i.e.,
I have not tested whether “uncoordinated” changes will yield correct estimates as well, i.e., with slopes on sigma()
/ar()
or for changes in sigma()
without a corresponding change in ar()
. There is a good chance that they will be accurate because ar()
/sigma()
are conditional on each other and they are modeled jointly for each x
. But this is just a hunch that needs scrutinizing.
Note that AR(N) as implemented in mcp
(and most functions in R) applies to the order of the data in the data frame without taking into account the actual value of x
. This has two important implications:
x
. Just do data = data[order(data$x), ]
.One “side-effect” of the mcp
implementation of autocorrelation using ar()
in the formulas is that you can infer when autocorrelation parameters and structures change.
Let’s simulate a change point in autocorrelation and see if we can infer it later:
# The model model = list( y ~ 1 + x + ar(1), # Slope ~ 0 + x + ar(1) # Slope and relative increase ) # Get predictions empty = mcp(model, sample = FALSE) set.seed(42) df = data.frame(x = seq(0, 100, length.out = 200)) df$y = empty$simulate( df$x, cp_1 = 60, int_1 = 20, x_1 = 1, x_2 = 1, # same slope ar1_1 = 0.8, ar1_2 = 0.2, sigma_1 = 5)
… and we use a prior to equate the slopes of each segment (read more about using priors to equate parameters and define constants). Now let’s see if we can recover these parameters. We use sample = "both"
because we will do a Savage-Dickey test later.
prior = list(x_2 = "x_1") # Set the two slopes equal fit = mcp(model, data = df, prior = prior, sample = "both")
First, let’s get a visual to see that the posterior is reasonably narrow and consistent:
plot(fit)
You could use plot(fit, geom_data = "line")
for a more classical line plot of the time series data. You can omit plotting AR effect entirely setting plot(fit, arma = FALSE)
. You can also plot the ar1_
parameters directly using which_y
, so the y-axis is the value of ar1
:
plot(fit, which_y = "ar1", lines = 100)
We recovered the parameters, including the change point:
## Family: gaussian(link = 'identity')
## Iterations: 9000 from 3 chains.
## Segments:
## 1: y ~ 1 + x + ar(1)
## 2: y ~ 1 ~ 0 + x + ar(1)
##
## Population-level parameters:
## name match sim mean lower upper Rhat n.eff
## ar1_1 OK 0.8 0.79 0.68 0.90 1 2193
## ar1_2 OK 0.2 0.11 -0.21 0.41 1 990
## cp_1 OK 60.0 62.61 51.07 74.10 1 299
## int_1 OK 20.0 20.80 15.77 26.15 1 165
## sigma_1 OK 5.0 4.89 4.38 5.36 1 3903
## x_1 OK 1.0 0.99 0.92 1.05 1 172
## x_2 OK 1.0 0.99 0.92 1.05 1 172
We can also plot some of the parameters. As usual, we see that the change point is not well defined by any known distribution. The fact that the posterior mean is around 60 does not (necessarily) mean that there is a high credence in this value. Usually, I find that any bi- or N-modality on the posterior matches well with what you would guess from looking at the raw data. As they say: Bayesian inference is common sense applied to data.
plot_pars(fit, regex_pars = "cp_1|ar_*")
As usual, we can test hypotheses and do model comparison (read more here). We can also ask how much more likely it is (relative to the prior) that there the two autocorrelations are equal compared to them differing. Because we sampled both the prior and posterior (mcp(..., sample = "both")
), we can do a Savage-Dickey density ratio test:
hypothesis(fit, "ar1_1 = ar1_2")
## hypothesis mean lower upper p BF
## 1 ar1_1 - ar1_2 = 0 0.6869875 0.3799884 1.007476 0.0001565573 0.0001565818
In this case, the evidence for equality is so small that it is rounded to zero. This means that not even a single MCMC sample visited a state with noticeable density at zero difference.
We can assess the same hypothesis using cross-validation, only this time it is more about whether the change point increases the predictive accuracy. The change point is favored with an elpd_diff/se_diff
factor of around 2.
fit_null = mcp(list(y ~ 1 + x + ar(1)), data = df) fit$loo = loo(fit) fit_null$loo = loo(fit_null) loo::loo_compare(fit$loo, fit_null$loo)
## elpd_diff se_diff
## model1 0.0 0.0
## model2 -8.8 4.3
Of course, we can also do directional tests. For example, what is the evidence that ar1_1
is more than 0.3 greater than ar1_2
? Answer: around 100 to one.
hypothesis(fit, "ar1_1 - 0.3 > ar1_2")
## hypothesis mean lower upper p BF
## 1 ar1_1 - 0.3 - ar1_2 > 0 0.3869875 0.07998841 0.7074764 0.9907778 107.4337
The default prior on autoregressive intercepts is a dunif(-1, 1)
to ensure a stationary series while being otherwise agnostic about the magnitude and direction of the autocorrelation. For most time series, you would expect a positive first-order autocorrelation, e.g., ar1_1 = "dunif(0, 1)"
or even something like ar1_1 = "dnorm(0.5, 0.5) T(0, 1)"
. Read more about priors. Similarly, you would expect a smaller-but-still-positive second-order autocorrelation, e.g., ar2_1 = dunif(0, ar1_1)
.
Here is a complete list of the (default) priors in the model above:
cbind(fit$prior)
## [,1]
## cp_1 "dunif(MINX, MAXX)"
## int_1 "dt(0, 3 * SDY, 3)"
## x_1 "dt(0, SDY / (MAXX - MINX), 3)"
## x_2 "x_1"
## sigma_1 "dnorm(0, SDY) T(0, )"
## ar1_1 "dunif(-1, 1)"
## ar1_2 "dunif(-1, 1)"
We can also visualize them because we sampled the prior. prior = TRUE
works in most mcp
functions, including plot()
and summary()
.
plot_pars(fit, prior = TRUE)
Notice that the posteriors are smoothed at sharp cutoffs, slightly misrepresenting the true distribution.
Let’s inspect the priors for a more advanced AR model, since you would often have to inform these:
model = list( y ~ 1 + ar(2, 1 + x), ~ 0 + ar(1, rel(1) + I(x^2)) ) empty = mcp(model, sample = FALSE) cbind(empty$prior)
## [,1]
## cp_1 "dunif(MINX, MAXX)"
## int_1 "dt(0, 3 * SDY, 3)"
## sigma_1 "dnorm(0, SDY) T(0, )"
## ar1_1 "dunif(-1, 1)"
## ar2_1 "dunif(-1, 1)"
## ar1_x_1 "dnorm(0, 1 / (MAXX - MINX))"
## ar2_x_1 "dnorm(0, 1 / (MAXX - MINX))"
## ar1_2 "dunif(-1, 1)"
## ar1_x_2_E2 "dnorm(0, 1 / (MAXX - MINX))"
As with sigma
, the link function for the autoregressive coefficient itself is "identity"
though the autoregressive coefficient is computed from residuals using the link function provided in mcp(..., family = func(link = "something"))
. Because stationarity is important, careful consideration of the allowed and probable values (the prior) is necessary when going beyond simple absolute intercepts to avoid ar
values outside [-1, 1].
Here are a few ways in which you may want to inform the ar
parameters in the model above:
ar2_1 ~ dunif(0, ar1_1)
.ar1_2
) can at most be -1 with the default prior. If ar1_1
was 0.8, this means that it can at most change to -0.2 in the next segment. Use rel()
with care.ar
-slopes is a 68% change that there is a change of 1 from the first to the last observation ("dnorm(0, 1 / (MAXX - MINX))"
) and you may want to, for example, suggest a shallow negative slope using "dnorm(0, 0.1 / (MAXX-MINX)) T( , 0)"
Here is the JAGS code for the second simulation example, i.e., the one with a single slope going from AR(2) to AR(1). You can print fit$simulate
and see that it runs much of the same code.
fit$jags_code
## model {
##
## # Priors for population-level effects
## cp_0 = MINX # mcp helper value.
## cp_2 = MAXX # mcp helper value.
##
## cp_1 ~ dunif(MINX, MAXX)
## int_1 ~ dt(0, 1/(3*SDY)^2, 3)
## x_1 ~ dt(0, 1/(SDY/(MAXX-MINX))^2, 3)
## x_2 = x_1 # Fixed
## sigma_1 ~ dnorm(0, 1/(SDY)^2) T(0, )
## ar1_1 ~ dunif(-1, 1)
## ar1_2 ~ dunif(-1, 1)
##
##
## # Apply autoregression to the residuals
## resid_arma_[1] = 0
## for (i_ in 2:length(x)) {
## resid_arma_[i_] = 0 +
## ar1_[i_] * resid_abs_[i_ - 1]
## }
##
## # Model and likelihood
## for (i_ in 1:length(x)) {
## X_1_[i_] = min(x[i_], cp_1)
## X_2_[i_] = min(x[i_], cp_2) - cp_1
##
## # Fitted value
## y_[i_] =
##
## # Segment 1: y ~ 1 + x + ar(1)
## (x[i_] >= cp_0) * int_1 +
## (x[i_] >= cp_0) * x_1 * X_1_[i_] +
##
## # Segment 2: y ~ 1 ~ 0 + x + ar(1)
## (x[i_] >= cp_1) * x_2 * X_2_[i_]
##
## # Fitted standard deviation
## sigma_[i_] = max(0,
## (x[i_] >= cp_0) * sigma_1 )
##
## # Autoregressive coefficient for all AR(1)
## ar1_[i_] =
## (x[i_] >= cp_0) * (x[i_] < cp_1) * ar1_1 +
## (x[i_] >= cp_1) * ar1_2
##
## # Likelihood and log-density for family = gaussian()
## y[i_] ~ dnorm((y_[i_] + resid_arma_[i_]), 1 / sigma_[i_]^2) # SD as precision
## loglik_[i_] = logdensity.norm(y[i_], (y_[i_] + resid_arma_[i_]), 1 / sigma_[i_]^2) # SD as precision
## resid_abs_[i_] = (y[i_]) - y_[i_] # Residuals represented by sigma_ after ARMA
## }
## }