Advanced Sequential Change-Point Detection for Univariate Models
Overview
Sequential change-point detection is a dynamic field that deals with real-time monitoring of data sequences to detect points where the statistical properties change. This process is crucial in various domains such as finance, quality control, signal processing, and biostatistics. This document explores the theoretical background, advanced methodologies, practical implementations, and real-world applications of sequential change-point detection for univariate models.
Theoretical Foundations
Change-Point Model
Consider a sequence of observations \(\{X_t\}_{t=1}^n\) where a change-point \(\tau\) might occur such that:
For \(t \leq \tau\), \(X_t \sim F_0\)
For \(t > \tau\), \(X_t \sim F_1\)
Here, \(F_0\) and \(F_1\) are the distributions before and after the change, respectively. The change-point \(\tau\) is unknown and needs to be detected.
Hypothesis Testing Framework
The change-point detection problem can be framed as a hypothesis testing problem:
- Null hypothesis (\(H_0\)): No change, i.e., \(F_0 = F_1\)
- Alternative hypothesis (\(H_1\)): There is a change, i.e., \(F_0 \neq F_1\)
Advanced Methods for Sequential Change-Point Detection
Likelihood Ratio Test (LRT)
The likelihood ratio test is a powerful method for detecting changes in statistical properties. The likelihood ratio for a potential change-point \(k\) is given by:
\[\Lambda_k = \frac{\prod_{t=1}^k f_0(X_t) \prod_{t=k+1}^n f_1(X_t)}{\prod_{t=1}^n f_0(X_t)}\]The maximum likelihood ratio test statistic is:
\[\Lambda_n = \max_{1 \leq k < n} \Lambda_k\]If \(\Lambda_n\) exceeds a critical value, a change is detected.
Cumulative Sum (CUSUM) Method
The CUSUM method accumulates deviations from a target value and signals a change when the cumulative sum exceeds a threshold. The CUSUM statistic is defined as:
\[C_n = \max_{0 \leq k < n} \left| \sum_{t=k+1}^n (X_t - \mu) \right|\]where \(\mu\) is the target mean. A change is detected if \(C_n\) exceeds a predefined threshold \(h\).
Page’s Cumulative Sum (CUSUM) Test
Page’s test, a variation of the CUSUM method, is used to detect shifts in the mean of a sequence of observations. The test statistic is:
\[W_n = \max_{1 \leq k < n} \left( \sum_{t=k+1}^n (X_t - \mu_0) - \frac{(n-k)}{2}\delta \right)\]where \(\delta\) is the magnitude of the shift. A change-point is signaled if \(W_n\) exceeds a threshold.
Shiryaev-Roberts Procedure
The Shiryaev-Roberts procedure is a sequential method based on a Bayesian approach. The statistic is updated as follows:
\[R_t = \left( 1 + R_{t-1} \right) \frac{f_1(X_t)}{f_0(X_t)}\]A change is detected if \(R_t\) exceeds a threshold. This method balances the detection delay and false alarm rate effectively.
Practical Implementations
Algorithm Design
- Initialize Parameters: Set initial values for the parameters such as means (\(\mu_0, \mu_1\)), variances (\(\sigma_0^2, \sigma_1^2\)), and thresholds (\(h\)).
- Real-Time Data Collection: Collect data in a sequential manner.
- Compute Test Statistics: Update the chosen test statistic (LRT, CUSUM, Page’s, or Shiryaev-Roberts) with each new observation.
- Apply Decision Rule: Compare the test statistic to the predefined threshold. If it exceeds the threshold, signal a change-point.
- Adapt and Iterate: Continue collecting data and updating the test statistics until a change is detected or the sequence ends.
Example Implementation: CUSUM Method
To implement the CUSUM method, follow these steps:
- Initialize: \(C_0 = 0\)
- Update Statistic: For each new observation \(X_n\): \(C_n = \max(0, C_{n-1} + (X_n - \mu_0 - \delta))\)
- Decision Rule: Signal a change-point if \(C_n > h\).
Monitoring for Multiple Change-Points
In real-world applications, sequences may contain multiple change-points. To handle this scenario:
- Reset After Detection: After a change-point is detected, reset the detection process to look for additional changes in the subsequent data.
- Segment-Based Methods: Use segment-based methods to partition the sequence into homogeneous segments and detect multiple change-points in each segment.
- Sliding Window Approach: Apply a sliding window technique that continuously monitors overlapping subsets of the data, allowing for the detection of changes that might be missed by analyzing the full sequence.
Real-World Applications
Quality Control in Manufacturing
In manufacturing, sequential change-point detection is used to monitor the production process for deviations from the desired quality standards. For instance, changes in the mean or variance of a critical dimension can indicate a problem with the machinery or materials.
Example: Monitoring the diameter of produced parts:
Before change: \(X_t \sim N(\mu_0, \sigma^2)\)
After change: \(X_t \sim N(\mu_1, \sigma^2)\)
Using the CUSUM method, deviations from the target diameter are accumulated, and a change is signaled when the cumulative sum exceeds a threshold.
Financial Market Analysis
In finance, change-point detection helps identify shifts in market trends or volatility. Detecting these shifts early can inform trading strategies and risk management.
Example: Monitoring stock returns:
Before change: \(X_t \sim N(\mu_0, \sigma^2)\)
After change: \(X_t \sim N(\mu_1, \sigma^2)\)
The likelihood ratio test can be used to detect changes in the mean return or volatility.
Environmental Monitoring
Environmental scientists use change-point detection to identify changes in environmental parameters such as temperature, pollution levels, or water quality.
Example: Monitoring air pollution levels:
Before change: \(X_t \sim N(\mu_0, \sigma^2)\)
After change: \(X_t \sim N(\mu_1, \sigma^2)\)
Page’s test can detect shifts in pollution levels, indicating potential changes in emissions or meteorological conditions.
Biostatistics and Epidemiology
In biostatistics, change-point detection is used to monitor changes in clinical trial data, disease incidence rates, or other health-related metrics.
Example: Monitoring disease incidence rates:
Before change: \(X_t \sim Poisson(\lambda_0)\)
After change: \(X_t \sim Poisson(\lambda_1)\)
The Shiryaev-Roberts procedure can detect increases in incidence rates, signaling a potential outbreak.
Mathematical Details
Likelihood Ratio Test (LRT) Derivation
The likelihood ratio for a change-point at \(k\) is:
\[\Lambda_k = \frac{\prod_{t=1}^k f_0(X_t) \prod_{t=k+1}^n f_1(X_t)}{\prod_{t=1}^n f_0(X_t)}\]Taking the logarithm of the likelihood ratio:
\[\log \Lambda_k = \sum_{t=k+1}^n \log \left( \frac{f_1(X_t)}{f_0(X_t)} \right)\]The test statistic is:
\[\Lambda_n = \max_{1 \leq k < n} \log \Lambda_k\]CUSUM Statistic Derivation
The CUSUM statistic is defined as:
\[C_n = \max_{0 \leq k < n} \left| \sum_{t=k+1}^n (X_t - \mu) \right|\]This can be rewritten iteratively as:
\[C_n = \max(0, C_{n-1} + (X_n - \mu))\]Page’s Test Derivation
Page’s test statistic is:
\[W_n = \max_{1 \leq k < n} \left( \sum_{t=k+1}^n (X_t - \mu_0) - \frac{(n-k)}{2} \delta \right)\]This incorporates the cumulative sum of deviations from the mean and adjusts for the expected shift \(\delta\).
Sequential change-point detection is a vital tool in various fields for real-time monitoring and timely detection of changes in data sequences. Advanced methods such as Likelihood Ratio Tests, CUSUM, Page’s Test, and the Shiryaev-Roberts procedure offer robust solutions tailored to different types of changes and data characteristics. Understanding the theoretical foundations, implementing practical algorithms, and applying these methods in real-world scenarios enable effective monitoring and response to changes in diverse applications.
Appendix: Python Implementation of the CUSUM Method
The following Python code demonstrates how to implement the CUSUM method for sequential change-point detection. In this example, we simulate a univariate sequence with a mean shift, apply the CUSUM method, and detect when a change-point occurs.
import numpy as np
import matplotlib.pyplot as plt
# CUSUM function
def cusum(data, mu_0, delta, h):
"""
CUSUM implementation for sequential change-point detection.
Parameters:
data (np.array): The observed data sequence.
mu_0 (float): The target mean before the change.
delta (float): The magnitude of the shift after the change.
h (float): The decision threshold for signaling a change-point.
Returns:
C (np.array): CUSUM statistic over time.
t_change (int): The time index where the change-point is detected, or None if not detected.
"""
C = np.zeros(len(data)) # Initialize the CUSUM statistic array
t_change = None # Change-point time index
for t in range(1, len(data)):
# Update the CUSUM statistic
C[t] = max(0, C[t - 1] + (data[t] - mu_0 - delta))
# Check if the CUSUM statistic exceeds the threshold
if C[t] > h:
t_change = t
break # Stop when the first change-point is detected
return C, t_change
# Simulated data
np.random.seed(42)
n = 100
mu_0 = 0 # Mean before change
mu_1 = 2 # Mean after change
change_point = 60 # Actual change-point
# Data before and after the change
data = np.concatenate([np.random.normal(mu_0, 1,
Explanation of the Code
CUSUM Function:
The cusum
function calculates the CUSUM statistic iteratively. It checks for a change-point when the CUSUM statistic exceeds the decision threshold h
.
Inputs:
data
: The observed sequence of data points.mu_0
: The target mean before the change.delta
: The magnitude of the shift in the mean after the change.h
: The threshold for signaling a change.
Outputs:
C
: The CUSUM statistic over time.t_change
: The index at which a change-point is detected, orNone
if no change-point is detected.
Simulated Data:
A sequence of 100 observations is generated. The first 60 points follow a normal distribution with mean mu_0 = 0
, and the last 40 points follow a normal distribution with mean mu_1 = 2
, representing a shift in the data.
CUSUM Parameters:
We set delta = 1
, the expected magnitude of the mean shift, and a decision threshold of h = 5
.
CUSUM Detection:
The CUSUM method is applied to the data, and the CUSUM statistic is updated sequentially. The detected change-point is marked once the statistic exceeds the threshold.
Plotting the Results:
The data, the actual change-point, the detected change-point, and the CUSUM statistic are plotted for visualization.
Output
The output of the code is a plot that shows the following:
- The observed data sequence.
- The actual change-point (marked by a red dashed line).
- The detected change-point using the CUSUM method (marked by a green dashed line).
- The CUSUM statistic over time (shown in orange).
In addition, the index of the detected change-point is printed to the console.
This example demonstrates how the CUSUM method can be used to detect changes in the statistical properties of a univariate time series in a sequential manner.
References
- Basseville, M., & Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Application. Prentice Hall.
- Shiryaev, A. N. (1963). “On optimum methods in quickest detection problems”. Theory of Probability and Its Applications.
- Page, E. S. (1954). “Continuous inspection schemes”. Biometrika.
- Pollak, M. (1985). “Optimal detection of a change in distribution”. The Annals of Statistics.
- Tartakovsky, A. G., & Veeravalli, V. V. (2005). “Asymptotically quickest change detection in distributed sensor systems”. Sequential Analysis.
- Tartakovsky, A. G., Rozovskii, B. L., & Blazek, R. E. (2014). Sequential Analysis: Hypothesis Testing and Changepoint Detection. CRC Press.
- Lai, T. L. (1995). “Sequential changepoint detection in quality control and dynamical systems”. Journal of the Royal Statistical Society: Series B.
- Lorden, G. (1971). “Procedures for reacting to a change in distribution”. The Annals of Mathematical Statistics.