Generates synthetic longitudinal and time-to-event data under a joint modeling framework where the longitudinal biomarker trajectory follows a damped harmonic oscillator model (second-order ODE), and the survival process is associated with the biomarker dynamics through shared random effects and trajectory-dependent hazard functions. The biomarker dynamics are parameterized using physically interpretable parameters: damping ratio, natural period, and excitation amplitude.
Usage
simulate(
n_subjects = 500,
longitudinal = list(xi = c(mean = 0.4, sd = 0.1), period = c(mean = 6, sd = 0.1),
excitation = list(offset = 0, covariates = c(x1 = 0.5, x2 = -0.45)), initial =
list(offset = c(biomarker = -0.5, velocity = -0.1), covariates = list(biomarker =
c(x1 = 0.08, x2 = -0.06), velocity = c(x1 = 0.25, x2 = -0.2))), error_sd = 0.1,
n_measurements = 100),
survival = list(baseline = list(type = "weibull", shape = 2, scale = 15), value = 0.8,
slope = 2, gamma = 1, covariates = c(w1 = 0.6, w2 = -0.8)),
covariates = list(x1 = list(type = "normal", mean = 0, sd = 1), x2 = list(type =
"normal", mean = 0, sd = 1), w1 = list(type = "normal", mean = 0, sd = 1), w2 =
list(type = "binary", prob = 0.5)),
maxt = 10,
seed = 42
)Arguments
- n_subjects
Integer specifying the number of subjects to simulate (default: 500)
- longitudinal
List specifying the longitudinal sub-model parameters:
- xi
Damping ratio \(\xi\) controlling the oscillation decay. Specified as c(mean = ..., sd = ...) for population mean and standard deviation. Values: \(\xi < 1\) (underdamped), \(\xi = 1\) (critically damped), \(\xi > 1\) (overdamped). Default: c(mean = 0.4, sd = 0.05)
- period
Natural period \(T\) of oscillation in time units, related to natural frequency as \(\omega = 2\pi/T\). Specified as c(mean = ..., sd = ...) for population mean and standard deviation. Default: c(mean = 6, sd = 1.0)
- excitation
List specifying external forcing parameters:
- offset
Constant excitation term \(f_0\) (default: 0.0)
- covariates
Named vector of covariate effects \(\boldsymbol{\beta}_{exc}\) on excitation (default: c(x1 = 0.5, x2 = -0.45))
- initial
List specifying initial condition parameters:
- offset
Named vector with baseline initial values:
c(biomarker = ..., velocity = ...)(default: c(biomarker = -0.5, velocity = -0.05))- covariates
List with two named vectors for covariate effects:
- biomarker
Covariate effects on initial biomarker (default: c(x1 = 0.08, x2 = -0.06))
- velocity
Covariate effects on initial velocity (default: c(x1 = 0.25, x2 = -0.2))
- error_sd
Standard deviation \(\sigma_{\epsilon}\) of the measurement error process (default: 0.1)
- n_measurements
Number of longitudinal measurements per subject (default: 100)
- survival
List specifying the survival sub-model parameters:
- baseline
List defining the Weibull baseline hazard function:
- type
Character string specifying the baseline hazard type (currently only "weibull" is supported)
- shape
Weibull shape parameter \(\kappa > 0\) (default: 2.0)
- scale
Weibull scale parameter \(\lambda > 0\) (default: 100.0)
- value
Association parameter \(\alpha_1\) linking current biomarker value to hazard (default: 0.8)
- slope
Association parameter \(\alpha_2\) linking biomarker velocity (or its power) to hazard (default: 2.0)
- gamma
Power parameter for velocity contribution, where \(\gamma = 0\) excludes velocity effect, \(\gamma = 1\) uses linear velocity \(\alpha_2\dot{m}_i(t)\), and \(\gamma = 2\) uses quadratic velocity \(\alpha_2[\dot{m}_i(t)]^2\) (default: 1)
- covariates
Named vector of regression coefficients \(\boldsymbol{\phi}\) for survival covariates (default: c(w1 = 0.6, w2 = -0.8))
- covariates
List defining the distributions of baseline covariates:
- x1
List with
type = "normal",mean = 0,sd = 1for standardized continuous covariate (longitudinal)- x2
List with
type = "normal",mean = 0,sd = 1for standardized continuous covariate (longitudinal)- w1
List with
type = "normal",mean = 0,sd = 1for standardized continuous covariate (survival)- w2
List with
type = "binary"andprob = 0.5for binary covariate (survival)
- maxt
Positive scalar specifying the maximum follow-up time in the study (default: 10 time units)
- seed
Integer seed for random number generation to ensure reproducibility (default: 42)
Value
A list containing four components:
longitudinal_dataData frame comprising longitudinal observations with columns:
id: Subject identifier (integer)time: Observation time point (numeric)observed: Measured biomarker value including measurement error, \(y_{ij}\)biomarker: True underlying biomarker value, \(m_i(t_{ij})\)velocity: First derivative of the biomarker trajectory, \(\dot{m}_i(t_{ij})\)acceleration: Second derivative of the biomarker trajectory, \(\ddot{m}_i(t_{ij})\)x1,x2: Time-invariant covariates (if specified)
survival_dataData frame containing time-to-event data with columns:
id: Subject identifiertime: Observed event or censoring time, \(T_i\)status: Event indicator, \(\delta_i\) (1 = event observed, 0 = censored)w1,w2: Baseline survival covariates (if specified)
stateAn \(n \times 2\) data frame containing initial states \([m_i(0), \dot{m}_i(0)]\) for each subject
random_effectsAn \(n \times 2\) matrix containing subject-specific random effects (centered at zero) for:
dyn_value: Random effect for biomarker level termdyn_slope: Random effect for velocity term
The matrix has attributes
mu(population mean vector of length 2) andsigma(\(2 \times 2\) covariance matrix). These correspond to the random effects specified in the formula structure(biomarker + velocity | id). Fixed effects (dyn_constand covariate terms) are not included as they have no between-subject variability.
Details
The simulation framework implements a joint model comprising longitudinal and survival sub-models linked through shared random effects and trajectory-dependent associations.
Default Parameter Design
The default parameters are calibrated to achieve the following properties:
Approximately 60\
Initial biomarker difference between risk groups: ~0.2
Final biomarker difference between event and censored groups: ~0.7
Damping ratio centered at 0.4 (underdamped oscillations)
Significant velocity differences between patients with different risk profiles, enabling survival prediction based on trajectory dynamics
These settings produce realistic heterogeneity in biomarker trajectories while maintaining numerical stability for joint model estimation.
Longitudinal Sub-model
The biomarker trajectory \(m_i(t)\) for subject \(i\) follows a damped harmonic oscillator with external forcing: $$\ddot{m}_i(t) + 2\xi\omega\dot{m}_i(t) + \omega^2 m_i(t) = k \omega^2 [f_0 + \mathbf{X}_i^T\boldsymbol{\beta}_{exc}]$$ where:
\(\omega = 2\pi/T\) is the natural angular frequency
\(\xi\) is the damping ratio determining oscillation behavior
\(k\) scales the excitation amplitude
\(f_0\) is the constant excitation term
\(\mathbf{X}_i\) contains time-invariant covariates
\(\boldsymbol{\beta}_{exc}\) represents covariate effects on excitation
Initial conditions are specified as:
\(m_i(0) = \mu_{m,0} + \mathbf{X}_i^T\boldsymbol{\beta}_{m,init}\)
\(\dot{m}_i(0) = \mu_{v,0} + \mathbf{X}_i^T\boldsymbol{\beta}_{v,init}\)
The observed longitudinal measurements incorporate additive Gaussian noise: $$y_{ij} = m_i(t_{ij}) + b_i + \epsilon_{ij}$$ where \(\epsilon_{ij} \sim \mathcal{N}(0, \sigma_\epsilon^2)\) represents independent measurement error.
Survival Sub-model
The instantaneous hazard function incorporates both current biomarker value and velocity (or its power): $$\lambda_i(t) = \lambda_0(t)\exp(\alpha_1 m_i(t) + \alpha_2[\dot{m}_i(t)]^\gamma + \mathbf{W}_i^T\boldsymbol{\phi})$$ where:
\(\lambda_0(t)\) denotes the Weibull baseline hazard: \(\lambda_0(t) = (\kappa/\lambda)(t/\lambda)^{\kappa-1}\)
\(\alpha_1\) quantifies the association with current biomarker value
\(\alpha_2\) quantifies the association with biomarker velocity (or its power)
\(\gamma \in \{0, 1, 2\}\) determines the power of velocity: 0 (no velocity effect), 1 (linear), or 2 (quadratic)
\(\mathbf{W}_i\) contains baseline covariates
\(\boldsymbol{\phi}\) represents covariate effects on survival
Patient-Specific Dynamics
The function models patient heterogeneity through continuous distributions of dynamics parameters \(\xi_i\) (damping ratio) and \(T_i\) (period): $$\xi_i \sim \mathcal{N}(\mu_\xi, \sigma_\xi^2), \quad T_i \sim \mathcal{N}(\mu_T, \sigma_T^2)$$ These are transformed to ODE parameters via: $$\omega_i = 2\pi/T_i, \quad \beta_{1,i} = -\omega_i^2, \quad \beta_{2,i} = -2\xi_i\omega_i$$ The transformation uses the Delta method to preserve the correct covariance structure in the ODE parameter space.
Random Effects Structure: Only \(\beta_{1,i}\)
(dyn_value) and \(\beta_{2,i}\) (dyn_slope) have
subject-specific random effects. The constant excitation term
(dyn_const) and all covariate excitation coefficients
(dyn_x1, dyn_x2, etc.) are fixed effects that take
the same population mean value for all subjects, with no
between-subject variability. This structure corresponds to the
formula specification observed ~ biomarker + velocity + x1 +
x2 + (biomarker + velocity | id).
Examples
# Example 1: Simple simulation with default parameters
sim_basic <- simulate(n_subjects = 20, seed = 123)
# Explore the output structure
names(sim_basic)
#> [1] "longitudinal_data" "survival_data" "state"
#> [4] "initial_state_mean" "random_effects"
head(sim_basic$longitudinal_data)
#> id time biomarker velocity acceleration observed x1 x2
#> 1 1 0.0 -0.4807686 -0.02655417 0.7898386 -0.4237065 -0.5604756 -1.067824
#> 2 1 0.1 -0.4795949 0.04883328 0.7176135 -0.5187984 -0.5604756 -1.067824
#> 3 1 0.2 -0.4712454 0.11692671 0.6441097 -0.4164648 -0.5604756 -1.067824
#> 4 1 0.3 -0.4564552 0.17764388 0.5702384 -0.5333797 -0.5604756 -1.067824
#> 5 1 0.4 -0.4359621 0.23098982 0.4968312 -0.3048338 -0.5604756 -1.067824
#> 6 1 0.5 -0.4104998 0.27704976 0.4246403 -0.4184387 -0.5604756 -1.067824
head(sim_basic$survival_data)
#> id time status w1 w2
#> 1 1 3.677542 1 -0.6947070 1
#> 2 2 10.000000 0 -0.2079173 0
#> 3 3 10.000000 0 -1.2653964 0
#> 4 4 2.485635 1 2.1689560 0
#> 5 5 10.000000 0 1.2079620 0
#> 6 6 10.000000 0 -1.1231086 1
# Check patient-specific dynamics
# Each patient has unique dynamics drawn from population distribution
head(sim_basic$random_effects)
#> m0 v0 dyn_value dyn_slope
#> [1,] 0.004826037 0.02778845 -0.037462229 -0.1026366
#> [2,] -0.019741038 -0.05960677 -0.025788552 -0.4844648
#> [3,] 0.171851598 0.54922059 -0.006184456 0.2112125
#> [4,] 0.034968811 0.11774796 0.021120355 -0.1491376
#> [5,] 0.033440041 0.11167240 -0.051345273 -0.1438230
#> [6,] 0.224001464 0.72044753 0.024461609 0.2150100
# Population parameters (mean and covariance)
attr(sim_basic$random_effects, "mu") # Mean of dynamics distribution
#> dyn_value dyn_slope
#> -1.096623 -0.837758
attr(sim_basic$random_effects, "sigma") # Covariance matrix
#> [,1] [,2]
#> [1,] 0.0013362015 0.0005103914
#> [2,] 0.0005103914 0.0440598636
# Example 2: Heterogeneous patient dynamics
# Patients differ in damping ratio (xi) and oscillation period
sim_hetero <- simulate(
n_subjects = 20,
longitudinal = list(
xi = c(mean = 0.5, sd = 0.2), # Mean damping ratio with variation
period = c(mean = 8, sd = 2), # Mean period with variation
excitation = list(
offset = 4.0, # Constant external force
covariates = c(x1 = 0.8, x2 = -0.5) # Covariate effects
),
initial = list(
offset = c(biomarker = 3.8, velocity = -0.1),
covariates = list(
biomarker = c(x1 = 0.1, x2 = -0.1),
velocity = c(x1 = 0.1, x2 = -0.05)
)
),
error_sd = 0.1,
n_measurements = 10
),
covariates = list(
x1 = list(type = "normal", mean = 0, sd = 1),
x2 = list(type = "normal", mean = 0, sd = 1),
w1 = list(type = "normal", mean = 0, sd = 1),
w2 = list(type = "binary", prob = 0.5)
),
seed = 456
)
# Example 3: Nearly uniform dynamics (low variability)
# Useful for testing when patient heterogeneity should be minimal
sim_uniform <- simulate(
n_subjects = 20,
longitudinal = list(
xi = c(mean = 0.707, sd = 0.01), # Very small variation
period = c(mean = 5, sd = 0.1), # Very small variation
excitation = list(offset = 0, covariates = numeric(0)),
initial = list(
offset = c(biomarker = -2, velocity = 0),
covariates = list(biomarker = numeric(0), velocity = numeric(0))
),
error_sd = 0.15,
n_measurements = 20
),
seed = 789
)
# Verify low variability - all patients should have similar dynamics
apply(sim_uniform$random_effects, 2, sd)
#> m0 v0 dyn_value dyn_slope
#> 0.00000000 0.00000000 0.06639935 0.04269099
# \donttest{
# Example 4: Comprehensive simulation with survival outcomes
sim_full <- simulate(
n_subjects = 30,
longitudinal = list(
xi = c(mean = 0.4, sd = 0.05),
period = c(mean = 8, sd = 0.3),
excitation = list(
offset = 4.0,
covariates = c(x1 = 0.8, x2 = -0.5)
),
initial = list(
offset = c(biomarker = 3.8, velocity = -0.1),
covariates = list(
biomarker = c(x1 = 0.1, x2 = -0.1),
velocity = c(x1 = 0.1, x2 = -0.05)
)
),
error_sd = 0.1,
n_measurements = 20
),
survival = list(
baseline = list(type = "weibull", shape = 3.0, scale = 23),
value = 0.4, # Effect of biomarker value on hazard
slope = 1.5, # Effect of biomarker velocity on hazard
gamma = 1, # Linear velocity effect
covariates = c(w1 = 0.4, w2 = -0.6)
),
covariates = list(
x1 = list(type = "normal", mean = 0, sd = 1),
x2 = list(type = "normal", mean = 0, sd = 1),
w1 = list(type = "normal", mean = 0, sd = 1),
w2 = list(type = "binary", prob = 0.5)
),
maxt = 10,
seed = 42
)
# Visualize patient-specific dynamics distribution
library(ggplot2)
# Recover original xi and period from dynamics parameters
mu <- attr(sim_full$random_effects, "mu")
dynamics_df <- as.data.frame(sim_full$random_effects)
dynamics_df$omega <- sqrt(-dynamics_df$dyn_value - mu["dyn_value"])
dynamics_df$xi_implied <- -(dynamics_df$dyn_slope + mu["dyn_slope"]) /
(2 * dynamics_df$omega)
dynamics_df$period_implied <- 2 * pi / dynamics_df$omega
# Distribution of damping ratios
ggplot(dynamics_df, aes(x = xi_implied)) +
geom_histogram(bins = 30, fill = "steelblue", alpha = 0.7) +
geom_vline(xintercept = 0.4, color = "red", linetype = "dashed") +
labs(
title = "Distribution of Patient Damping Ratios",
x = "Damping Ratio (xi)", y = "Count"
) +
theme_minimal()
# Joint distribution of xi and period
ggplot(dynamics_df, aes(x = period_implied, y = xi_implied)) +
geom_point(alpha = 0.5) +
geom_density_2d(color = "steelblue") +
labs(
title = "Patient-Specific Dynamics",
x = "Period", y = "Damping Ratio (xi)"
) +
theme_minimal()
# Compare trajectories for patients with different dynamics
# Select patients at 10th, 50th, 90th percentiles of xi
quantiles <- quantile(dynamics_df$xi_implied, probs = c(0.1, 0.5, 0.9))
example_ids <- sapply(quantiles, function(q) {
which.min(abs(dynamics_df$xi_implied - q))
})
plot_data <- sim_full$longitudinal_data[
sim_full$longitudinal_data$id %in% example_ids,
]
plot_data$xi_group <- factor(
plot_data$id,
levels = example_ids,
labels = c("Low xi (10%)", "Median xi (50%)", "High xi (90%)")
)
ggplot(plot_data, aes(x = time, y = biomarker, color = xi_group)) +
geom_line(linewidth = 1) +
labs(
title = "Biomarker Trajectories by Damping Ratio",
x = "Time", y = "Biomarker Value",
color = "Patient Group"
) +
theme_minimal()
# Kaplan-Meier survival curve
library(survival)
km_fit <- survfit(Surv(time, status) ~ 1, data = sim_full$survival_data)
plot(km_fit,
xlab = "Time", ylab = "Survival Probability",
main = "Kaplan-Meier Survival Curve"
)
# }