Skip to contents

This is the main prediction function in the mbtransfer package. Given an object of class ts_inter (see ts_from_dfs()), this will fit a collection of linear gradient boosting-based transfer function models. The output is an object of class mbtransfer_model. Each component boosting model is contained in the @parameters slot, which is a list whose j^th element is the model for the j^th taxon (row) within each ts's values matrix.

Usage

mbtransfer(
  ts_inter,
  P = 1,
  Q = 1,
  nrounds = 500,
  early_stopping_rounds = 5,
  verbose = 0,
  lambda = 0.1,
  alpha = 0.01,
  eta = 0.1,
  interactions = "search",
  nthread = -1,
  ...
)

Arguments

ts_inter

An object of class ts_inter containing the time-varying microbiome community, environmental interventions, and static host features data. The columns for each element of the values matrix are expected to be consecutive timepoints in an changing community. ts_from_dfs() is a helper to create these objects from data.frames with the relevant information.

P

The number of historical community composition timepoints to consider when making predictions.

Q

The number of historical intervention timepoints to consider when making predictions.

nrounds

The maximum number of rounds for each taxon's gradient boosting model. Smaller values will lead to faster training, but at the risk of poorer fits. Defaults to 500.

early_stopping_rounds

If the loss function does not improve after this many rounds, then the model is assumed to have converged and training is stopped. Defaults to 5.

verbose

Should information about each gradient boosting model's performance be printed? Allowable values are 2 (all information), 1 (some information), and 0 (no information, default).

lambda

The l2-regularization value in the linear gradient boosting model. Defaults to 1e-1.

alpha

The l1-regularization value in the linear gradient boosting model. Defaults to 1e-2. This value generally leads to less sparse fits, which creates useful variation in potential downstream mirror statistics calculations.

eta

The learning rate. Defaults to 0.1. This is slower than the default in xgboost (0.3) but has been found to improve stability when needing to train on taxa with a wide range of abundances.

Examples

data(sim_ts)
fit <- mbtransfer(sim_ts)
fit@parameters[[1]]
#> ##### xgb.Booster
#> call:
#>   xgb.train(params = params, data = dtrain, nrounds = nrounds, 
#>     evals = evals, verbose = verbose, print_every_n = print_every_n, 
#>     early_stopping_rounds = early_stopping_rounds, maximize = maximize, 
#>     save_period = save_period, save_name = save_name, xgb_model = xgb_model, 
#>     callbacks = callbacks, booster = "gblinear", alpha = ..2, 
#>     lambda = ..3, eta = ..4, nthread = ..5)
#> # of features: 194 
#> # of rounds:  500 
#> xgb.attributes:
#>    best_iteration, best_score 
#> callbacks:
#>    early_stop, evaluation_log 
#> evaluation_log:
#>      iter train_rmse
#>     <int>      <num>
#>         1   4.693780
#>         2   4.659525
#> ---                 
#>       499   4.284943
#>       500   4.284925