Collaborative Inference for Cox Proportional Hazards Model in Distributed Data
Source:R/colsa.R
colsa.RdPerforms collaborative inference for the Cox proportional hazards model, tailored for distributed data environments.
Arguments
- formula
A formula object specifying the model. The response must be a survival object created using the
Survfunction from thesurvivalpackage.- data
A data frame containing the variables in the model.
- n_basis
An integer specifying the number of basis functions to use.
- boundary
A numeric vector specifying the boundary for the basis functions. It should be a two-element vector with the lower and upper bounds.
- scale
A numeric value specifying the scaling factor for the number of pre-estimation basis functions. Default is 2.0.
- init
A character string specifying the initialization method.
"zero"(default) uses zero initialization for fast computation."flexsurv"usesflexsurvsplinefor better initial values but slower computation.- ...
Additional arguments passed to the optimization function.
Value
An object of class "colsa" containing the following components:
- logLik
The log-likelihood of the fitted model.
- theta
The estimated model parameters.
- hessian
The Hessian of the objective function at the solution.
- n_basis
The number of basis functions used.
- n_features
The number of features in the model.
- n_samples
The number of samples in the dataset.
- formula
The formula used to fit the model.
- boundary
The boundary for the basis functions.
- scale
The scaling factor for the number of pre-estimation basis functions.
- call
The matched call.
Details
This function employs the nlm optimization method to estimate
the model parameters. If the optimization fails to converge, an error is
raised. During the pre-estimation stage, parameters are projected to
mitigate bias introduced in the early stage.
Examples
formula <- Surv(time, status) ~ x1 + x2 + x31 + x42 + x43 + x44
boundary <- c(0, max(sim$time))
df_sub <- sim[sim$group == 1, , drop = FALSE]
fit <- colsa(formula, df_sub, n_basis = 3, boundary = boundary)
for (batch in 2:6) {
df_sub <- sim[sim$group == batch, , drop = FALSE]
fit <- update(fit, df_sub, n_basis = "auto")
}
summary(fit)
#> Call:
#> update.colsa(object = fit, newdata = df_sub, n_basis = "auto")
#>
#> Number of basis functions: 6
#>
#> coef exp(coef) se z p
#> x1 0.15677 1.16973 0.01016 15.430 < 2e-16 ***
#> x2 -0.17309 0.84106 0.02232 -7.756 8.74e-15 ***
#> x31 0.38400 1.46815 0.05779 6.645 3.04e-11 ***
#> x42 0.27318 1.31413 0.09090 3.005 0.002654 **
#> x43 0.29061 1.33725 0.08325 3.491 0.000482 ***
#> x44 0.17502 1.19128 0.08217 2.130 0.033167 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> exp(coef) exp(-coef) lower .95 upper .95
#> x1 1.1697 0.8549 1.1467 1.1933
#> x2 0.8411 1.1890 0.8051 0.8787
#> x31 1.4681 0.6811 1.3109 1.6442
#> x42 1.3141 0.7610 1.0997 1.5704
#> x43 1.3372 0.7478 1.1359 1.5743
#> x44 1.1913 0.8394 1.0141 1.3994