📘 Mathematical Foundations of gen_surv
This page presents the mathematical formulation behind the survival models implemented in the gen_surv
package.
1. Cox Proportional Hazards Model (CPHM)
This semi-parametric approach models the hazard as a baseline component multiplied by an exponential term involving the covariates. The hazard function conditioned on covariates is:
$$ h(t \mid X) = h_0(t) \exp(X \beta) $$
Where:
( h_0(t) ) is the baseline hazard
( X \beta ) is the linear predictor
Weibull baseline hazard:
$$ h_0(t) = \lambda \rho t^{\rho - 1} $$
The cumulative hazard is:
$$ \Lambda_0(t) = \lambda t^{\rho} $$
And the survival function becomes:
$$ S(t \mid X) = \exp\left(-\Lambda_0(t) \exp(X \beta)\right) $$
2. Time-Dependent Covariate Model (TDCM)
This extension of the Cox model allows covariate values to vary during follow-up, accommodating exposures or treatments that change over time:
$$ h(t \mid Z(t)) = h_0(t) \exp(Z(t) \beta) $$
In this package, piecewise covariate values are simulated with dependence across segments using correlated normal draws.
3. Continuous-Time Multi-State Markov Model (CMM)
This framework captures transitions between a finite set of states where waiting times are exponentially distributed. With generator matrix ( Q ), the transition probability matrix is given by:
$$ P(t) = \exp(Qt) $$
Where:
( Q ) is the rate matrix
( P(t)_{ij} ) gives the probability of being in state j at time t given starting in state i
5. Accelerated Failure Time (AFT) Models
These fully parametric models relate covariates to the logarithm of the survival time. They assume the effect of a covariate speeds up or slows down the event time directly, rather than acting on the hazard.
Log-Normal AFT
The model assumes:
$$ \log(T_i) = X_i \beta + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0, \sigma^2) $$
Thus:
$$ T_i \sim \text{Log-Normal}(X_i \beta, \sigma^2) $$
The survival function is:
$$ S(t \mid X) = 1 - \Phi\left(\frac{\log(t) - X \beta}{\sigma}\right) $$
Where:
( \Phi ) is the standard normal cumulative distribution function (CDF)
This model is especially useful when the proportional hazards assumption is not valid and provides interpretable effects in the time domain.
Notes
All models support censoring:
Uniform: ( C_i \sim U(0, \text{cens_par}) )
Exponential: ( C_i \sim \text{Exp}(\text{cens_par}) )
6. Competing Risks Models
These models handle scenarios where several distinct failure types can occur. Each cause has its own hazard function, and the observed status indicates which event occurred (1, 2, …). The package includes constant-hazard and Weibull-hazard versions.
7. Mixture Cure Models
These models posit that a subset of the population is cured and will never
experience the event of interest. The generator mixes a logistic cure component
with an exponential hazard for the uncured, returning a cured
indicator
column alongside the usual time and status.
8. Piecewise Exponential Model
Here the baseline hazard is assumed constant within each of several user-specified intervals. This allows flexible hazard shapes over time while remaining easy to simulate.