Skip to contents

Performs collaborative inference for the Cox proportional hazards model, tailored for distributed data environments.

Usage

colsa(formula, data, n_basis, boundary, scale = 2, init = "zero", ...)

Arguments

formula

A formula object specifying the model. The response must be a survival object created using the Surv function from the survival package.

data

A data frame containing the variables in the model.

n_basis

An integer specifying the number of basis functions to use.

boundary

A numeric vector specifying the boundary for the basis functions. It should be a two-element vector with the lower and upper bounds.

scale

A numeric value specifying the scaling factor for the number of pre-estimation basis functions. Default is 2.0.

init

A character string specifying the initialization method. "zero" (default) uses zero initialization for fast computation. "flexsurv" uses flexsurvspline for better initial values but slower computation.

...

Additional arguments passed to the optimization function.

Value

An object of class "colsa" containing the following components:

logLik

The log-likelihood of the fitted model.

theta

The estimated model parameters.

hessian

The Hessian of the objective function at the solution.

n_basis

The number of basis functions used.

n_features

The number of features in the model.

n_samples

The number of samples in the dataset.

formula

The formula used to fit the model.

boundary

The boundary for the basis functions.

scale

The scaling factor for the number of pre-estimation basis functions.

call

The matched call.

Details

This function employs the nlm optimization method to estimate the model parameters. If the optimization fails to converge, an error is raised. During the pre-estimation stage, parameters are projected to mitigate bias introduced in the early stage.

Examples

formula <- Surv(time, status) ~ x1 + x2 + x31 + x42 + x43 + x44
boundary <- c(0, max(sim$time))
df_sub <- sim[sim$group == 1, , drop = FALSE]
fit <- colsa(formula, df_sub, n_basis = 3, boundary = boundary)
for (batch in 2:6) {
  df_sub <- sim[sim$group == batch, , drop = FALSE]
  fit <- update(fit, df_sub, n_basis = "auto")
}
summary(fit)
#> Call:
#> update.colsa(object = fit, newdata = df_sub, n_basis = "auto")
#> 
#> Number of basis functions:  6 
#> 
#>         coef exp(coef)       se      z        p    
#> x1   0.15677   1.16973  0.01016 15.430  < 2e-16 ***
#> x2  -0.17309   0.84106  0.02232 -7.756 8.74e-15 ***
#> x31  0.38400   1.46815  0.05779  6.645 3.04e-11 ***
#> x42  0.27318   1.31413  0.09090  3.005 0.002654 ** 
#> x43  0.29061   1.33725  0.08325  3.491 0.000482 ***
#> x44  0.17502   1.19128  0.08217  2.130 0.033167 *  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>     exp(coef) exp(-coef) lower .95 upper .95
#> x1  1.1697    0.8549     1.1467    1.1933   
#> x2  0.8411    1.1890     0.8051    0.8787   
#> x31 1.4681    0.6811     1.3109    1.6442   
#> x42 1.3141    0.7610     1.0997    1.5704   
#> x43 1.3372    0.7478     1.1359    1.5743   
#> x44 1.1913    0.8394     1.0141    1.3994