gen_surv: Survival Data Simulation in Pythonο
gen_surv is a comprehensive Python package for simulating survival data under various statistical models, inspired by the R package genSurv
. It provides a unified interface for generating synthetic survival datasets that are essential for:
Research: Testing new survival analysis methods
Education: Teaching survival analysis concepts
Benchmarking: Comparing different survival models
Validation: Testing statistical software implementations
Quick Start
Install with pip:
pip install gen-surv
Generate your first dataset:
from gen_surv import generate
df = generate(model="cphm", n=100, beta=0.5, covariate_range=2.0)
```{note}
The `to_sksurv` helper and related tests require the optional
dependency `scikit-survival`. Install it with `poetry install --with dev`
or `pip install scikit-survival` if you need this functionality.
Supported Modelsο
Model |
Description |
Use Case |
---|---|---|
CPHM |
Cox Proportional Hazards |
Standard survival regression |
AFT |
Accelerated Failure Time |
Non-proportional hazards |
CMM |
Continuous-Time Markov |
Multi-state processes |
TDCM |
Time-Dependent Covariates |
Dynamic risk factors |
THMM |
Time-Homogeneous Markov |
Hidden state processes |
Competing Risks |
Multiple event types |
Cause-specific hazards |
Mixture Cure |
Long-term survivors |
Logistic cure fraction |
Piecewise Exponential |
Piecewise constant hazard |
Flexible baseline |
Algorithm Descriptionsο
For a brief summary of each statistical model see Algorithm Overview. Mathematical details and notation are provided on the π Mathematical Foundations of gen_surv page.
Documentation Contentsο
- Getting Started
- Tutorials
- API Reference
- π Mathematical Foundations of
gen_surv
- 1. Cox Proportional Hazards Model (CPHM)
- 2. Time-Dependent Covariate Model (TDCM)
- 3. Continuous-Time Multi-State Markov Model (CMM)
- 4. Time-Homogeneous Hidden Markov Model (THMM)
- 5. Accelerated Failure Time (AFT) Models
- Notes
- 6. Competing Risks Models
- 7. Mixture Cure Models
- 8. Piecewise Exponential Model
- Algorithm Overview
- Examples
- Troubleshooting
- Read the Docs
- Contributing
- Contributing to gen_surv
- Changelog
- CHANGELOG
- References
- Cox (1972)
- Farewell (1982)
- Fine and Gray (1999)
- Andersen et al. (1993)
- Zucchini et al. (2017)
- Klein and Moeschberger (2003)
- Kalbfleisch and Prentice (2002)
- Cook and Lawless (2007)
- Kaplan and Meier (1958)
- Therneau and Grambsch (2000)
- Fleming and Harrington (1991)
- Collett (2015)
- Kleinbaum and Klein (2012)
Quick Examplesο
Cox Proportional Hazards Modelο
import gen_surv as gs
# Basic CPHM with uniform censoring
df = gs.generate(
model="cphm",
n=500,
beta=0.5,
covariate_range=2.0,
model_cens="uniform",
cens_par=3.0
)
Accelerated Failure Time Modelο
# AFT with log-normal distribution
df = gs.generate(
model="aft_ln",
n=200,
beta=[0.5, -0.3, 0.2],
sigma=1.0,
model_cens="exponential",
cens_par=2.0
)
Multi-State Markov Modelο
# Three-state illness-death model
df = gs.generate(
model="cmm",
n=300,
qmat=[[0, 0.1], [0.05, 0]],
p0=[1.0, 0.0],
model_cens="uniform",
cens_par=5.0
)
Key Featuresο
Unified Interface: Single
generate()
function for all modelsFlexible Censoring: Support for uniform and exponential censoring
Rich Parameterization: Extensive customization options
Command-Line Interface: Generate datasets from terminal
Comprehensive Validation: Input parameter checking
Educational Focus: Clear mathematical documentation
Citationο
If you use gen_surv in your research, please cite:
@software{ribeiro2025gensurvpy,
title = {gen_surv: Survival Data Simulation in Python},
author = {Diogo Ribeiro},
year = {2025},
url = {https://github.com/DiogoRibeiro7/genSurvPy},
version = {1.0.9}
}
Licenseο
MIT License - see LICENSE for details.
For foundational papers related to these models see the References. Information on building the docs is provided in the Read the Docs page.