Basic Usage Tutorial
This tutorial covers the fundamentals of generating survival data with gen_surv.
Your First Dataset
Let’s start with the simplest case - generating data from a Cox proportional hazards model:
from gen_surv import generate
import pandas as pd
# Generate basic CPHM data
df = generate(
model="cphm",
n=200,
beta=0.7,
covariate_range=1.5,
model_cens="exponential",
cens_par=2.0,
seed=42 # For reproducibility
)
print(f"Dataset shape: {df.shape}")
print(f"Event rate: {df['status'].mean():.2%}")
print("\nFirst 5 rows:")
print(df.head())
Understanding Parameters
Common Parameters
All models share these parameters:
n
: Sample size (number of individuals)model_cens
: Censoring type (“uniform” or “exponential”)cens_par
: Censoring distribution parameterseed
: Random seed for reproducibility
Model-Specific Parameters
Each model has unique parameters. For CPHM:
beta
: Covariate effect (hazard ratio = exp(beta))covariate_range
: Range for uniform covariate generation [0, covariate_range]
Censoring Mechanisms
gen_surv supports two censoring types:
Uniform Censoring
# Censoring times uniformly distributed on [0, cens_par]
df_uniform = generate(
model="cphm",
n=100,
beta=0.5,
covariate_range=2.0,
model_cens="uniform",
cens_par=3.0
)
Exponential Censoring
# Censoring times exponentially distributed with mean cens_par
df_exponential = generate(
model="cphm",
n=100,
beta=0.5,
covariate_range=2.0,
model_cens="exponential",
cens_par=2.0
)
Exploring Your Data
Basic data exploration:
import matplotlib.pyplot as plt
import seaborn as sns
# Event rate by covariate level
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
# Histogram of survival times
ax1.hist(df['time'], bins=20, alpha=0.7, edgecolor='black')
ax1.set_xlabel('Time')
ax1.set_ylabel('Frequency')
ax1.set_title('Distribution of Observed Times')
# Event rate vs covariate
df['covariate_bin'] = pd.cut(df['covariate'], bins=5)
event_rate = df.groupby('covariate_bin')['status'].mean()
event_rate.plot(kind='bar', ax=ax2, rot=45)
ax2.set_ylabel('Event Rate')
ax2.set_title('Event Rate by Covariate Level')
plt.tight_layout()
plt.show()
Additional Example: Mixture Cure Model
The mixture cure model separates subjects into cured and susceptible groups. Here’s how to simulate data using this model:
from gen_surv import generate
df_mixture = generate(
model="mixture_cure",
n=200,
cure_fraction=0.3,
betas_survival=[0.8, -0.4],
betas_cure=[-0.6, 0.2],
seed=123,
)
print(df_mixture[["time", "status", "cured"]].head())
Next Steps
Try different models (model_comparison)
Learn advanced features (advanced_features)
See integration examples (integration_examples)