Generalized Stepwise Regression for Prediction Models in Clustered Data
Source:R/metapred.R
metapred.Rd
Generalized stepwise regression for obtaining a prediction model that is validated with (stepwise) internal-external cross-validation, in or to obtain adequate performance across data sets. Requires data from individuals in multiple studies.
Usage
metapred(
data,
strata,
formula,
estFUN = "glm",
scope = NULL,
retest = FALSE,
max.steps = 1000,
center = FALSE,
recal.int = FALSE,
cvFUN = NULL,
cv.k = NULL,
metaFUN = NULL,
meta.method = NULL,
predFUN = NULL,
perfFUN = NULL,
genFUN = NULL,
selFUN = "which.min",
gen.of.perf = "first",
...
)
Arguments
- data
data.frame containing the data. Note that
metapred
removes observations with missing data listwise for all variables informula
andscope
, to ensure that the same data is used in each model in each step. The outcome variable should benumeric
or coercible to numeric by as.numeric().- strata
Character to specify the name of the strata (e.g. studies or clusters) variable
- formula
formula
of the first model to be evaluated.metapred
will start atformula
and update it using terms ofscope
. Defaults to full main effects model, where the first column indata
is assumed to be the outcome and all remaining columns (exceptstrata
) predictors. See formula for formulas in general.- estFUN
Function for estimating the model in the first stage. Currently "lm", "glm" and "logistfirth" are supported.
- scope
formula
. The difference betweenformula
andscope
defines the range of models examined in the stepwise search. Defaults to NULL, which leads to the intercept-only model. Ifscope
is not nested informula
, this implies backwards selection will be applied (default). Ifscope
is nested informula
, this implies forward selection will be applied. If equal, no stepwise selection is applied.- retest
Logical. Should added (removed) terms be retested for removal (addition)?
TRUE
implies bi-directional stepwise search.- max.steps
Integer. Maximum number of steps (additions or removals of terms) to take. Defaults to 1000, which is essentially as many as it takes. 0 implies no stepwise selection.
- center
logical. Should numeric predictors be centered around the cluster mean?
- recal.int
Logical. Should the intercept be recalibrated in each validation?
- cvFUN
Cross-validation method, on the study (i.e. cluster or stratum) level. "l1o" for leave-one-out cross-validation (default). "bootstrap" for bootstrap. Or "fixed", for one or more data sets which are only used for validation. A user written function may be supplied as well.
- cv.k
Parameter for cvFUN. For
cvFUN="bootstrap"
, this is the number of bootstraps. ForcvFUN="fixed"
, this is a vector of the indices of the (sorted) data sets. Not used forcvFUN="l1o"
.- metaFUN
Function for computing the meta-analytic coefficient estimates in two-stage MA. By default, rma.uni, from the metafor package is used. Default settings are univariate random effects, estimated with "REML". Method can be passed trough the
meta.method
argument.- meta.method
Name of method for meta-analysis. Default is "REML". For more options see rma.uni.
- predFUN
Function for predicting new values. Defaults to the predicted probability of the outcome, using the link function of
glm()
orlm()
.- perfFUN
Function for computing the performance of the prediction models. Default: mean squared error (
perfFUN="mse"
, aka Brier score for binomial outcomes).Other options are"var.e"
(variance of prediction error),"auc"
(area under the curve),"cal_int"
(calibration intercept), and"cal_slope"
(multiplicative calibration slope) and"cal_add_slope"
(additive calibration slope), or alist
of these, where only the first is used for model selection.- genFUN
Function or
list
of named functions for computing generalizability of the performance. Default:rema
, summary statistic of a random effects meta-analysis. Choose"rema_tau"
for heterogeneity estimate of a random effects meta-analysis,genFUN="abs_mean"
for (absolute) mean,coefficient_of_variation
for the coefficient of variation. If alist
containing these, only the first is used for model selection.- selFUN
Function for selecting the best method. Default: lowest value for
genFUN
. Should be set to "which.max" if high values forgenFUN
indicate a good model.- gen.of.perf
For which performance measures should generalizability measures be computed?
"first"
(default) for only the first."respective"
for matching the generalizability measure to the performance measure on the same location in the list."factorial"
for applying all generalizability measures to all performance estimates.- ...
To pass arguments to estFUN (e.g. family = "binomial"), or to other FUNctions.
Value
A list of class metapred
, containing the final model in global.model
, and the stepwise
tree of estimates of the coefficients, performance measures, generalizability measures in stepwise
.
Details
Use subset.metapred to obtain an individual prediction model from a metapred
object.
Note that formula.changes
is currently unordered; it does not represent the order of changes in the stepwise
procedure.
metapred
is still under development, use with care.
References
Debray TPA, Moons KGM, Ahmed I, Koffijberg H, Riley RD. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med. 2013;32(18):3158-80.
de Jong VMT, Moons KGM, Eijkemans MJC, Riley RD, Debray TPA. Developing more generalizable prediction models from pooled studies and large clustered data sets. Stat Med. 2021;40(15):3533--59.
Riley RD, Tierney JF, Stewart LA. Individual participant data meta-analysis: a handbook for healthcare research. Hoboken, NJ: Wiley; 2021. ISBN: 978-1-119-33372-2.
Schmid CH, Stijnen T, White IR. Handbook of meta-analysis. First edition. Boca Raton: Taylor and Francis; 2020. ISBN: 978-1-315-11940-3.
See also
forest.metapred
for generating a forest plot of prediction model performance
Examples
data(DVTipd)
if (FALSE) {
# Explore heterogeneity in intercept and assocation of 'ddimdich'
glmer(dvt ~ 0 + cluster + (ddimdich|study), family = binomial(), data = DVTipd)
}
# Scope
f <- dvt ~ histdvt + ddimdich + sex + notraum
# Internal-external cross-validation of a pre-specified model 'f'
fit <- metapred(DVTipd, strata = "study", formula = f, scope = f, family = binomial)
fit
#> Call: metapred(data = DVTipd, strata = "study", formula = f, scope = f,
#> family = binomial)
#>
#> Started with model:
#> dvt ~ histdvt + ddimdich + sex + notraum
#> <environment: 0x559b1011bad8>
#>
#> Generalizability:
#> unchanged
#> 1 0.1484983
#>
#> Cross-validation stopped after 0 steps, as no changes were requested. Final model:
#> Meta-analytic model of prediction models estimated in 4 strata. Coefficients:
#> (Intercept) histdvt ddimdich sex notraum
#> -4.1180636 0.6174010 1.6962441 0.9647970 0.3761707
# Let's try to simplify model 'f' in order to improve its external validity
metapred(DVTipd, strata = "study", formula = f, family = binomial)
#> Call: metapred(data = DVTipd, strata = "study", formula = f, family = binomial)
#>
#> Started with model:
#> dvt ~ histdvt + ddimdich + sex + notraum
#> <environment: 0x559b1011bad8>
#>
#> Generalizability:
#> unchanged
#> 1 0.1484983
#>
#> Generalizability:
#> ddimdich histdvt notraum sex
#> 1 0.136086 0.1375105 0.12977 0.141173
#>
#> Continued with model:
#> dvt ~ histdvt + ddimdich + sex
#> <environment: 0x559b1011bad8>
#>
#> Generalizability:
#> ddimdich histdvt sex
#> 1 0.1366828 0.1279623 0.1319755
#>
#> Continued with model:
#> dvt ~ ddimdich + sex
#> <environment: 0x559b1011bad8>
#>
#> Generalizability:
#> ddimdich sex
#> 1 0.1355548 0.1303254
#>
#> Cross-validation stopped after 3 steps, as no improvement was possible. Final model:
#> Meta-analytic model of prediction models estimated in 4 strata. Coefficients:
#> (Intercept) ddimdich sex
#> -3.6187987 1.7130967 0.8784071
# We can also try to build a generalizable model from scratch
if (FALSE) {
# Some additional examples:
metapred(DVTipd, strata = "study", formula = dvt ~ 1, scope = f, family = binomial) # Forwards
metapred(DVTipd, strata = "study", formula = f, scope = f, family = binomial) # no selection
metapred(DVTipd, strata = "study", formula = f, max.steps = 0, family = binomial) # no selection
metapred(DVTipd, strata = "study", formula = f, recal.int = TRUE, family = binomial)
metapred(DVTipd, strata = "study", formula = f, meta.method = "REML", family = binomial)
}
# By default, metapred assumes the first column is the outcome.
newdat <- data.frame(dvt=0, histdvt=0, ddimdich=0, sex=1, notraum=0)
fitted <- predict(fit, newdata = newdat)