mcp does regression with one or Multiple Change Points (MCP) between Generalized and hierarchical Linear Segments using Bayesian inference. mcp aims to provide maximum flexibility for analyses with a priori knowledge about the number of change points and the form of the segments in between.

Change points are also called switch points, break points, broken line regression, broken stick regression, bilinear regression, piecewise linear regression, local linear regression, segmented regression, and (performance) discontinuity models. mcp aims to be be useful for all of them. See how mcp compares to other R packages.

Under the hood, mcp takes a formula-representation of linear segments and turns it into JAGS code. mcp leverages the power of tidybayes, bayesplot, coda, and loo to make change point analysis easy and powerful.

Install

  1. Install the latest version of JAGS. Linux users can fetch binaries here.

  2. Install from CRAN:

    or install the development version from GitHub:

    if (!requireNamespace("remotes")) install.packages("remotes")
    remotes::install_github("lindeloev/mcp")

At a glance

Here are some example mcp models. mcp takes a list of formulas - one for each segment. The change point(s) are the x at which data changes from being better predicted by one formula to the next. The first formula is just response ~ predictors and the most common formula for segment 2+ would be ~ predictors (more details here).

Scroll down to see brief introductions to each of these, or browse the website articles for more thorough worked examples and discussions.

Brief worked example

Plot and summary

The default plot includes data, fitted lines drawn randomly from the posterior, and change point(s) posterior density for each chain:

plot(fit)

Use summary() to summarise the posterior distribution as well as sampling diagnostics. They were simulated with mcp so the summary include the “true” values in the column sim and the column match show whether this true value is within the interval:

summary(fit)

rhat is the Gelman-Rubin convergence diagnostic, eff is the effective sample size.

plot_pars(fit) can be used to inspect the posteriors and convergence of all parameters. See the documentation of plot_pars() for many other plotting options. Here, we plot just the (population-level) change points. They often have “strange” posterior distributions, highlighting the need for a computational approach:

plot_pars(fit, regex_pars = "cp_")

Tests and model comparison

We can test (joint) probabilities in the model using hypothesis() (see more here). For example, what is the evidence (given priors) that the first change point is later than 25 against it being less than 25?

hypothesis(fit, "cp_1 > 25")

For model comparisons, we can fit a null model and compare the predictive performance of the two models using (approximate) leave-one-out cross-validation (see more here). Our null model omits the first plateau and change point, essentially testing the credence of that change point:

Leveraging the power of loo::loo, we see that the two-change-points model is preferred (it is on top), but the elpd_diff / se_diff ratio ratio indicate that this preference is not very strong.

fit$loo = loo(fit)
fit_null$loo = loo(fit_null)

loo::loo_compare(fit$loo, fit_null$loo)
       elpd_diff se_diff
model1  0.0       0.0
model2 -7.6       4.6

Highlights from in-depth guides

The articles on the mcp website go in-depth with the functionality of mcp. Here is an executive summary, to give you a quick sense of what mcp can do.

About mcp formulas and models:

  • Parameter names are int_i (intercepts), cp_i (change points), x_i (slopes), phi_i (autocorrelation), and sigma_* (variance).
  • The change point model is basically an ifelse model.
  • Use rel() to specify that parameters are relative to those corresponding in the previous segments.
  • Generate data using fit$simulate().

Using priors:

  • See priors in fit$prior.
  • Set priors using mcp(..., prior = list(cp_1 = "dnorm(0, 1)", cp_1 = "dunif(0, 45)").
  • The default prior for change points is fast for estimation but is mathematically “messy”. The Dirichlet prior (cp_i = "dirichlet(1)") is slow but beautiful.
  • Fix parameters to specific values using cp_1 = 45.
  • Share parameters between segments using slope_1 = "slope_2".
  • Truncate priors using T(lower, upper), e.g., int_1 = "dnorm(0, 1) T(0, )". mcp applies this automatically to change point priors to enforce order restriction. This is true for varying change points too.
  • Do prior predictive checks using mcp(model, data, sample = "prior").

Varying change points:

mcp currently supports the following GLM:

Model comparison and hypothesis testing:

Modeling variance and autoregression:

  • ~ sigma(1) models an intercept change in variance. ~ sigma(0 + x) models increasing/decreasing variance.
  • ~ ar(N) models Nth order autoregression on residuals. ~ar(N, 0 + x) models increasing/decreasing autocorrelation.
  • You can model anything for sigma() and ar(). For example, ~ x + sigma(1 + x + I(x^2)) models polynomial change in variance with x on top of a slope on the mean.
  • Simulate effects and change points on sigma() and ar() using fit$simulate()

Tips, tricks, and debugging

Some examples

mcp aims to support a wide variety of models. Here are some example models for inspiration.

Means

Find the single change point between two plateaus (see how this data was simulated with mcp).

Varying change points

Here, we find the single change point between two joined slopes. While the slopes are shared by all participants, the change point varies by id. Read more about varying change points in mcp.

Summarise the varying change points using ranef() or plot them using plot_pars(fit, "varying"). Again, this data was simulated so the columns match and sim are added to show simulation values and whether they are inside the interval. Set the width wider for a more lenient criterion.

ranef(fit, width = 0.98)

Generalized linear models

mcp supports Generalized Linear Modeling. See extended examples using binomial() and poisson(). These data were simulated with mcp here.

Here is a binomial change point model with three segments. We plot the 95% HDI too:

Use plot(fit, rate = FALSE) if you want the points and fit lines on the original scale of y rather than divided by N.

Time series

mcp allows for flexible time series analysis with autoregressive residuals of arbitrary order. Below, we model a change from a plateau with strong positive AR(2) residuals to a slope with medium AR(1) residuals. These data were simulated with mcp here and the generating values are in the sim column. You can also do regression on the AR coefficients themselves using e.g., ar(1, 1 + x). Read more here.

The AR(N) parameters on intercepts are named ar[order]_[segment]. All parameters, including the change point, are well recovered:

The fit plot shows the inferred autocorrelated nature:

plot(fit_ar)

Variance change and prediction intervals

You can model variance by adding a sigma() term to the formula. The inside sigma() can take everything that the formulas outside do. Read more in the article on variance. The example below models two change points. The first is variance-only: variance abruptly increases and then declines linearly with x. The second change point is the stop of the variance-decline and the onset of a slope on the mean.

Effects on variance is best visualized using prediction intervals. See more in the documentation for plot.mcpfit().

model = list(
  y ~ 1,
  ~ 0 + sigma(1 + x),
  ~ 0 + x
)
fit = mcp(model, ex_variance, cores = 3, adapt = 5000, iter = 5000)
plot(fit, q_predict = TRUE)

Quadratic and other exponentiations

Write exponents as I(x^N). E.g., quadratic I(x^2), cubic I(x^3), or some other power function I(x^1.5). The example below detects the onset of linear + quadratic growth. This is often called the BLQ model (Broken Line Quadratic) in nutrition research.

model = list(
  y ~ 1,
  ~ 0 + x + I(x^2)
)
fit = mcp(model, ex_quadratic)
plot(fit)

Trigonometric and others

You can use sin(x), cos(x), and tan(x) to do trigonometry. This can be useful for seasonal trends and other periodic data. You can also do exp(x), abs(x), log(x), and sqrt(x), but beware that the two latter will currently fail in segment 2+. Raise an issue if you need this.

model = list(
  y ~ 1 + sin(x),
  ~ 1 + cos(x) + x
)

fit = mcp(model, ex_trig)
plot(fit)

Using rel() and priors

Read more about formula options and priors.

Here we find the two change points between three segments. The slope and intercept of segment 2 are parameterized relative to segment 1, i.e., modeling the change in intercept and slope since segment 1. So too with the second change point (cp_2) which is now the distance from cp_1.

Some of the default priors are overwritten. The first intercept (int_1) is forced to be 10, the slopes are in segment 1 and 3 is shared. It is easy to see these effects in the ex_rel_prior dataset because they violate it somewhat. The first change point has to be at x = 20 or later.

Comparing the summary to the fitted lines in the plot, we can see that int_2 and x_2 are relative values. We also see that the “wrong” priors made it harder to recover the parameters used to simulate this data (match and sim columns):

summary(fit)

Do much more

Don’t be constrained by these simple mcp functions. fit$samples is a regular mcmc.list object and all methods apply. You can work with the MCMC samples just as you would with brms, rstanarm, jags, or other samplers using the always excellent tidybayes:

Citation

This preprint formally introduces mcp. Find citation info at the link, call citation("mcp") or copy-paste this into your reference manager:

  @Article{,
    title = {mcp: An R Package for Regression With Multiple Change Points},
    author = {Jonas Kristoffer Lindeløv},
    journal = {OSF Preprints},
    year = {2020},
    doi = {10.31219/osf.io/fzqxv},
    encoding = {UTF-8},
  }