gen_surv: Survival Data Simulation in Python

Documentation Status PyPI version Python versions

gen_surv is a comprehensive Python package for simulating survival data under various statistical models, inspired by the R package genSurv. It provides a unified interface for generating synthetic survival datasets that are essential for:

  • Research: Testing new survival analysis methods

  • Education: Teaching survival analysis concepts

  • Benchmarking: Comparing different survival models

  • Validation: Testing statistical software implementations

Quick Start

Install with pip:

pip install gen-surv

Generate your first dataset:

from gen_surv import generate
df = generate(model="cphm", n=100, beta=0.5, covariate_range=2.0)

```{note}
The `to_sksurv` helper and related tests require the optional
dependency `scikit-survival`. Install it with `poetry install --with dev`
or `pip install scikit-survival` if you need this functionality.

Supported Models

Model

Description

Use Case

CPHM

Cox Proportional Hazards

Standard survival regression

AFT

Accelerated Failure Time

Non-proportional hazards

CMM

Continuous-Time Markov

Multi-state processes

TDCM

Time-Dependent Covariates

Dynamic risk factors

THMM

Time-Homogeneous Markov

Hidden state processes

Competing Risks

Multiple event types

Cause-specific hazards

Mixture Cure

Long-term survivors

Logistic cure fraction

Piecewise Exponential

Piecewise constant hazard

Flexible baseline

Algorithm Descriptions

For a brief summary of each statistical model see Algorithm Overview. Mathematical details and notation are provided on the πŸ“˜ Mathematical Foundations of gen_surv page.

Documentation Contents

Quick Examples

Cox Proportional Hazards Model

import gen_surv as gs

# Basic CPHM with uniform censoring
df = gs.generate(
    model="cphm", 
    n=500, 
    beta=0.5, 
    covariate_range=2.0,
    model_cens="uniform", 
    cens_par=3.0
)

Accelerated Failure Time Model

# AFT with log-normal distribution
df = gs.generate(
    model="aft_ln",
    n=200,
    beta=[0.5, -0.3, 0.2],
    sigma=1.0,
    model_cens="exponential",
    cens_par=2.0
)

Multi-State Markov Model

# Three-state illness-death model
df = gs.generate(
    model="cmm",
    n=300,
    qmat=[[0, 0.1], [0.05, 0]],
    p0=[1.0, 0.0],
    model_cens="uniform",
    cens_par=5.0
)

Key Features

  • Unified Interface: Single generate() function for all models

  • Flexible Censoring: Support for uniform and exponential censoring

  • Rich Parameterization: Extensive customization options

  • Command-Line Interface: Generate datasets from terminal

  • Comprehensive Validation: Input parameter checking

  • Educational Focus: Clear mathematical documentation

Citation

If you use gen_surv in your research, please cite:

@software{ribeiro2025gensurvpy,
  title = {gen_surv: Survival Data Simulation in Python},
  author = {Diogo Ribeiro},
  year = {2025},
  url = {https://github.com/DiogoRibeiro7/genSurvPy},
  version = {1.0.9}
}

License

MIT License - see LICENSE for details.

For foundational papers related to these models see the References. Information on building the docs is provided in the Read the Docs page.