Gaussian Processes (GPs) are a highly flexible Bayesian tool that can be employed in a variety of modeling tasks, including time-series analysis. While traditional methods like ARIMA focus on generative processes, Gaussian Processes approach the problem from a curve-fitting perspective, allowing the user to define how different temporal components—such as trend, seasonality, and noise—should behave.

This post delves deep into the mechanics of GPs, how they work in the context of time-series data, and practical ways to implement them using Python.

Simulating Time Series Data

We start by simulating some time-series data that reflects common real-world temporal dynamics—periodicity (seasonality) and a trend.

import numpy as np
import matplotlib.pyplot as plt

# Simulate time series data
np.random.seed(123)
x = np.linspace(0, 4 * np.pi, 100)
y = 3 * np.sin(2 * x) + np.random.rand(100) * 2
trend = 0.08 * np.arange(1, 101)
y += trend

# Plot the time series data
plt.plot(np.arange(1, 101), y, label='Data')
plt.xlabel('Timepoint')
plt.ylabel('Values')
plt.title('Simulated Time Series Data')
plt.show()

This code generates a noisy sine wave with an upward trend, simulating periodic and linear components often seen in time-series data.

Time Series Decomposition

Time-series analysis often revolves around decomposing a dataset into key components, such as:

  • Trend: Does the data exhibit a consistent increase or decrease over time?
  • Autocorrelation: How correlated are data points with past values?
  • Seasonality: Are there recurring patterns, such as daily or yearly cycles?

We can decompose a time series into these components and model each one separately. This decomposition process lays the foundation for modeling with Gaussian Processes, as we later combine different GP kernels to capture these dynamics.

Gaussian Processes Explained

A Gaussian Process is a generalization of the multivariate normal distribution to an infinite number of dimensions. This means we can represent any function as a distribution over an infinite set of points, with each point having a mean and covariance.

A GP is characterized by two key elements:

  • Mean function ($m(x)$): Typically set to zero in practice, as the covariance function drives most of the model’s behavior.
  • Covariance function ($k(x, x’)$): Determines how different points in the input space are related. The choice of kernel is crucial for controlling the smoothness, periodicity, and other aspects of the function.

Mathematically, the GP is written as:

\[f(x) \sim \mathcal{GP}(m(x), k(x, x'))\]

The covariance function $k(x, x’)$, or kernel, dictates the GP’s behavior. By selecting different kernels, we can model various patterns such as trends or periodic behaviors.

Covariance Functions (Kernels)

Exponentiated Quadratic Kernel

The most common GP kernel, also called the Radial Basis Function (RBF) or squared exponential kernel, controls the “wiggliness” of the function. Its formula is:

\[k(x, x') = \sigma^2 \exp \left( -\frac{(x - x')^2}{2 l^2} \right)\]

Where $\sigma$ is the variance, and $l$ is the lengthscale, controlling how quickly the function varies.

def cov_exp_quad(xa, xb, sigma, l):
    """Exponentiated Quadratic Kernel"""
    sq_dist = np.subtract.outer(xa, xb) ** 2
    return sigma ** 2 * np.exp(-0.5 * sq_dist / l ** 2)

Periodic Kernel

A periodic kernel models cyclic behavior, making it useful for capturing seasonality in time-series data:

\[k(x, x') = \sigma^2 \exp \left( -\frac{2 \sin^2 \left( \pi (x - x') / p \right)}{l^2} \right)\]

Here, $p$ controls the period of repetition.

def cov_periodic(xa, xb, sigma, l, p):
    """Periodic Kernel"""
    sin_dist = np.sin(np.pi * np.abs(np.subtract.outer(xa, xb)) / p)
    return sigma ** 2 * np.exp(-2 * (sin_dist ** 2) / l ** 2)

Combining Kernels

In practice, we combine different kernels to model the complex dynamics of a time series. For example, by summing an exponentiated quadratic kernel (to capture long-term trends) with a periodic kernel (for seasonality), and adding white noise to account for random fluctuations, we get a more comprehensive model.

# Combining kernels
Sigma_exp_quad = cov_exp_quad(y, y, 1, len(y))
Sigma_periodic = cov_periodic(y, y, 1, 1, 25)
Sigma_white_noise = np.eye(len(Sigma_exp_quad)) * 0.01
Sigma_comb = Sigma_exp_quad + Sigma_periodic + Sigma_white_noise

Fitting Gaussian Processes

After defining our covariance structure, we can now fit the GP to our data, first by sampling from the prior to visualize what plausible functions might look like.

Sampling from the Prior

import scipy.stats as stats

def sample_gp_prior(Sigma, n_samples=5):
    """Sample from the GP prior using the covariance matrix"""
    return np.random.multivariate_normal(mean=np.zeros(Sigma.shape[0]), cov=Sigma, size=n_samples).T

# Sample from the GP prior
samples_prior = sample_gp_prior(Sigma_comb)

# Plot prior samples
plt.plot(np.arange(1, 101), samples_prior)
plt.title('Samples from GP Prior')
plt.xlabel('Timepoint')
plt.ylabel('Values')
plt.show()

The prior reveals candidate functions that our GP believes are plausible fits before seeing any data.

Predicting from the Posterior

Once we incorporate the observed data, the GP updates its beliefs, and we can make predictions using the posterior distribution.

def gp_posterior(x_train, x_pred, y_train, kernel, noise=0.05, **kernel_params):
    """Calculate the GP posterior mean and covariance matrix"""
    K = kernel(x_train, x_train, **kernel_params) + noise ** 2 * np.eye(len(x_train))
    K_s = kernel(x_train, x_pred, **kernel_params)
    K_ss = kernel(x_pred, x_pred, **kernel_params) + noise ** 2 * np.eye(len(x_pred))
    K_inv = np.linalg.inv(K)

    # Posterior mean
    mu_s = K_s.T.dot(K_inv).dot(y_train)
    
    # Posterior covariance
    cov_s = K_ss - K_s.T.dot(K_inv).dot(K_s)
    
    return mu_s, cov_s

# Define training data (timepoints 1 to 100) and new points to predict
x_train = np.arange(1, 101)
x_pred = np.linspace(1, 100, 50)
y_train = y

# Compute posterior mean and covariance
mu_s, cov_s = gp_posterior(x_train, x_pred, y_train, kernel=cov_exp_quad, sigma=1, l=75)

# Plot posterior mean
plt.plot(x_pred, mu_s, label='Posterior Mean')
plt.fill_between(x_pred, mu_s - 1.96 * np.sqrt(np.diag(cov_s)),
                 mu_s + 1.96 * np.sqrt(np.diag(cov_s)), alpha=0.1, label='95% CI')
plt.scatter(x_train, y_train, label='Training Data')
plt.title('GP Posterior')
plt.xlabel('Timepoint')
plt.ylabel('Value')
plt.legend()
plt.show()

Gaussian Processes offer a flexible and interpretable approach to modeling time series. By carefully selecting and combining kernels, we can capture trends, seasonality, and noise, making them an invaluable tool in the machine learning and statistical toolkit.