Title: | Exploring Heterogeneity in Meta-Analysis using Random Forests |
---|---|
Description: | Conduct random forests-based meta-analysis, obtain partial dependence plots for metaforest and classic meta-analyses, and cross-validate and tune metaforest- and classic meta-analyses in conjunction with the caret package. A requirement of classic meta-analysis is that the studies being aggregated are conceptually similar, and ideally, close replications. However, in many fields, there is substantial heterogeneity between studies on the same topic. Classic meta-analysis lacks the power to assess more than a handful of univariate moderators. MetaForest, by contrast, has substantial power to explore heterogeneity in meta-analysis. It can identify important moderators from a larger set of potential candidates (Van Lissa, 2020). This is an appealing quality, because many meta-analyses have small sample sizes. Moreover, MetaForest yields a measure of variable importance which can be used to identify important moderators, and offers partial prediction plots to explore the shape of the marginal relationship between moderators and effect size. |
Authors: | Caspar J. van Lissa |
Maintainer: | Caspar J. van Lissa <[email protected]> |
License: | GPL-3 |
Version: | 0.1.5 |
Built: | 2024-11-04 05:20:17 UTC |
Source: | https://github.com/cjvanlissa/metaforest |
Conduct a t-test or z-test for coefficients of a model.
coef_test(x, par1, par2, distribution = "pt")
coef_test(x, par1, par2, distribution = "pt")
x |
A model. |
par1 |
Numeric or character. Name or position of the first parameter. |
par2 |
Numeric or character. Name or position of the second parameter. |
distribution |
Character. Which distribution to use. Currently, can be
one of |
Named vector.
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) res <- rma(yi, vi, mods = ~alloc-1, data=dat, method="REML") coef_test(res, 1, 2)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) res <- rma(yi, vi, mods = ~alloc-1, data=dat, method="REML") coef_test(res, 1, 2)
A systematic review and meta-analysis of the effects of performing acts of kindness on the well-being of the actor.
data(curry)
data(curry)
A data.frame with 56 rows and 18 columns.
study_id | factor |
Unique identifier of the study |
effect_id | integer |
Unique identifier of the effect size |
d | numeric |
Standardized mean difference between the control group and intervention group |
vi | numeric |
Variance of the effect size |
n1i | numeric |
Number of participants in the intervention group |
n1c | numeric |
Number of participants in the control group |
sex | numeric |
Percentage of male participants |
age | numeric |
Mean age of participants |
location | character |
Geographical location of the study |
donor | character |
From what population did the donors (helpers) originate? |
donorcode | factor |
From what population did the donors (helpers) originate? Dichotomized to Anxious or Typical |
interventioniv | character |
Description of the intervention / independent variable |
interventioncode | factor |
Description of the intervention / independent variable, categorized to Acts of Kindness, Prosocial Spending, or Other |
control | character |
Description of the control condition |
controlcode | factor |
Description of the control condition, categorized to Neutral Activity, Nothing, or Self Help (performing a kind act for oneself) |
recipients | character |
Who were the recipients of the act of kindness? |
outcomedv | character |
What was the outcome, or dependent variable, of the study? |
outcomecode | factor |
What was the outcome, or dependent variable, of the study? Categorized into Happiness, Life Satisfaction, PN Affect (positive or negative), and Other |
doi:10.1016/j.jesp.2018.02.014
Curry, O. S., Rowland, L. A., Van Lissa, C. J., Zlotowitz, S., McAlaney, J., & Whitehouse, H. (2018). Happy to help? A systematic review and meta-analysis of the effects of performing acts of kindness on the well-being of the actor. Journal of Experimental Social Psychology, 76, 320-329. doi:10.1016/j.jesp.2018.02.014
Extract proximity matrix for a MetaForest object.
extract_proximity(fit, newdata)
extract_proximity(fit, newdata)
fit |
object of class \'MetaForest\'. |
newdata |
new data with the same columns as the data used for |
an n x n matrix where position i, j gives the proportion of times observation i and j are in the same terminal node across all trees.
A review of 17 experimental studies published between 1980 and 2005 on the effect of specialized training on the competency of caregivers in childcare.
data(fukkink_lont)
data(fukkink_lont)
A data.frame with 78 rows and 30 columns.
id_exp | integer |
Unique identifier of the study |
yi | numeric |
Standardized mean difference between the control group and |
vi | numeric |
Variance of the effect size |
Journal | factor |
Publication type (scientific journal or other publications) |
Setting | factor |
Setting (center-based care or family daycare) |
Integrated | factor |
Whether the training was integrated into childcare practice |
Supervision | factor |
Whether supervision was part of the training |
Scope | factor |
Scope of the training (narrow or broad) |
Location | factor |
Location of the training (one-site or multi-site) |
Curriculum | factor |
Fixed curriculum |
Control | factor |
Alternative treatment for control group |
Assignment | factor |
Random assignment or matching (at the level of the individual caregiver or childcare center) |
Train_Knowledge | factor |
Explicit focus on knowledge |
Train_Skills | factor |
Explicit focus on skills |
Train_Attitude | factor |
Explicit focus on attitude |
Video | factor |
Use of video feedback |
Design | factor |
Single group, or two-group experimental design |
Pre_Post | factor |
Pretest/posttest design (yes/no) |
Blind | factor |
Was a blinding procedure used? |
Attrition | numeric |
Attrition from the experimental condition (percentage) |
Pretest_es | numeric |
Pre-test effect size |
Self_report | factor |
Self-report measures of caregiver competencies versus ‘objective’ test or observation by independent observer |
DV_Knowledge | factor |
Test focused on knowledge |
DV_Skills | factor |
Test focused skills |
DV_Attitude | factor |
Test focused on attitudes |
DV_Aligned | factor |
Test aligned with the content of the training (yes/no) |
Two_group_design | factor |
Single group, or two-group experimental design |
Trainee_Age | numeric |
Trainees’ age |
Trainee_Experience | numeric |
Trainees’ working experience |
n_total | integer |
Total n at post-test |
doi:10.1016/j.ecresq.2007.04.005
Fukkink, R. G., & Lont, A. (2007). Does training matter? A meta-analysis and review of caregiver training studies. Early childhood research quarterly, 22(3), 294-311. doi:10.1016/j.ecresq.2007.04.005
MetaForest uses a weighted random forest to explore heterogeneity in
meta-analytic data. MetaForest is a wrapper for ranger
(Wright & Ziegler, 2015). As input, MetaForest takes the study effect sizes
and their variances (these can be computed, for example, using the
metafor
package), as well as the moderators
that are to be included in the model. By default, MetaForest uses
random-effects weights, and estimates the between-studies variance using a
restricted maximum-likelihood estimator. However, it may be beneficial to
first conduct an unweighted MetaForest, and then use the estimated residual
heterogeneity from this model as the estimate of tau2
for a
random-effects weighted MetaForest.
MetaForest( formula, data, vi = "vi", study = NULL, whichweights = "random", num.trees = 500, mtry = NULL, method = "REML", tau2 = NULL, ... )
MetaForest( formula, data, vi = "vi", study = NULL, whichweights = "random", num.trees = 500, mtry = NULL, method = "REML", tau2 = NULL, ... )
formula |
Formula. Specify a formula for the MetaForest model, for
example, |
data |
A data.frame containing the effect size, moderators, and the variance of the effect size. |
vi |
Character. Specify the name of the column in the |
study |
Character. Optionally, specify the name of the column in the
|
whichweights |
Character. Indicate what time of weights are required.
A random-effects MetaForest is grown by specifying |
num.trees |
Atomic integer. Specify the number of trees in the forest. Defaults to 500. |
mtry |
Atomic integer. Number of candidate moderators available for each split. Defaults to the square root of the number moderators (rounded down). |
method |
Character. Specify the method by which to estimate the residual
variance. Can be set to one of the following: "DL", "HE", "SJ", "ML", "REML",
"EB", "HS", or "GENQ". Default is "REML". See the
|
tau2 |
Numeric. Specify a predetermined value for the residual heterogeneity. Entering a value here supersedes the estimated tau2 value. Defaults to NULL. |
... |
Additional arguments are passed directly to ranger. It is recommended not to use additional arguments. |
For dependent data, a clustered MetaForest analysis is more
appropriate. This is because the predictive performance of a MetaForest
analysis is evaluated on out-of-bootstrap cases, and when cases out of the
bootstrap sample originate from the same study, the model will be overly
confident in its ability to predict their value. When the MetaForest is
clustered by the study
variable, the dataset is first split into two
cross-validation samples by study. All dependent effect sizes from each study
are thus included in the same cross-validation sample. Then, two random
forests are grown on these cross-validation samples, and for each random
forest, the other sample is used to calculate prediction error and variable
importance, see doi:10.1007/s11634-016-0276-4.
List of length 3. The "forest" element of this list is an object of class "ranger", containing the results of the random forests analysis. The "rma_before" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the raw data, without moderators. The "rma_after" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the residual heterogeneity, or the difference between the effect sizes predicted by MetaForest and the observed effect sizes.
#Example 1: #Simulate data with a univariate linear model set.seed(42) data <- SimulateSMD() #Conduct unweighted MetaForest analysis mf.unif <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "unif", method = "DL") #Print model mf.unif #Conduct random-effects weighted MetaForest analysis mf.random <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "random", method = "DL", tau2 = 0.0116) #Print summary summary(mf.random) #Example 2: Real data from metafor #Load and clean data data <- dat.bangertdrowns2004 data[, c(4:12)] <- apply(data[ , c(4:12)], 2, function(x){ x[is.na(x)] <- median(x, na.rm = TRUE) x}) data$subject <- factor(data$subject) data$yi <- as.numeric(data$yi) #Conduct MetaForest analysis mf.bd2004 <- MetaForest(formula = yi~ grade + length + minutes + wic+ meta, data, whichweights = "unif") #Print MetaForest object mf.bd2004 #Check convergence plot plot(mf.bd2004) #Check summary summary(mf.bd2004, digits = 4) #Examine variable importance plot VarImpPlot(mf.bd2004)
#Example 1: #Simulate data with a univariate linear model set.seed(42) data <- SimulateSMD() #Conduct unweighted MetaForest analysis mf.unif <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "unif", method = "DL") #Print model mf.unif #Conduct random-effects weighted MetaForest analysis mf.random <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "random", method = "DL", tau2 = 0.0116) #Print summary summary(mf.random) #Example 2: Real data from metafor #Load and clean data data <- dat.bangertdrowns2004 data[, c(4:12)] <- apply(data[ , c(4:12)], 2, function(x){ x[is.na(x)] <- median(x, na.rm = TRUE) x}) data$subject <- factor(data$subject) data$yi <- as.numeric(data$yi) #Conduct MetaForest analysis mf.bd2004 <- MetaForest(formula = yi~ grade + length + minutes + wic+ meta, data, whichweights = "unif") #Print MetaForest object mf.bd2004 #Check convergence plot plot(mf.bd2004) #Check summary summary(mf.bd2004, digits = 4) #Examine variable importance plot VarImpPlot(mf.bd2004)
This function allows users to rely on the powerful caret
package for
cross-validating and tuning a MetaForest analysis. Methods for MetaForest
are not included in the caret package, because the interface of caret is not
entirely compatible with MetaForest's model call. Specifically, MetaForest is
not compatible with the train
methods for classes 'formula' or
'recipe', because the variance of the effect size must be a column of the
training data x. The name of this column is specified using the argument
'vi'.
ModelInfo_mf()
ModelInfo_mf()
To train a clustered MetaForest, for nested data structures, simply provide the optional argument 'study' to the train function, to specify the study ID. This should again refer to a column of x.
When training a clustered MetaForest, make sure to use 'index = groupKFold(your_study_id_variable, k = 10))' in traincontrol, to sample by study ID when creating cross-validation partitions; otherwise the testing error will be positively biased.
ModelInfo list of length 17.
## Not run: # Prepare data data <- dat.bangertdrowns2004 data[, c(4:12)] <- apply(data[ , c(4:12)], 2, function(x){ x[is.na(x)] <- median(x, na.rm = TRUE) x}) data$subject <- factor(data$subject) data$yi <- as.numeric(data$yi) # Load caret library(caret) set.seed(999) # Specify the resampling method as 10-fold CV fit_control <- trainControl(method = "cv", number = 10) cv_mf_fit <- train(y = data$yi, x = data[,c(3:13, 16)], method = ModelInfo_mf(), trControl = fit_control) # Cross-validated clustered MetaForest data <- get(data(dat.bourassa1996)) data <- escalc(measure = "OR", ai = lh.le, bi = lh.re, ci = rh.le, di= rh.re, data = data, add = 1/2, to = "all") data$mage[is.na(data$mage)] <- median(data$mage, na.rm = TRUE) data[c(5:8)] <- lapply(data[c(5:8)], factor) data$yi <- as.numeric(data$yi) # Set up 10-fold grouped CV fit_control <- trainControl(method = "cv", index = groupKFold(data$sample, k = 10)) # Set up a custom tuning grid for the three tuning parameters of MetaForest rf_grid <- expand.grid(whichweights = c("random", "fixed", "unif"), mtry = c(2, 4, 6), min.node.size = c(2, 4, 6)) # Train the model cv.mf.cluster <- train(y = data$yi, x = data[, c("selection", "investigator", "hand_assess", "eye_assess", "mage", "sex", "vi", "sample")], study = "sample", method = ModelInfo_mf(), trControl = fit_control, tuneGrid = rf_grid) ## End(Not run)
## Not run: # Prepare data data <- dat.bangertdrowns2004 data[, c(4:12)] <- apply(data[ , c(4:12)], 2, function(x){ x[is.na(x)] <- median(x, na.rm = TRUE) x}) data$subject <- factor(data$subject) data$yi <- as.numeric(data$yi) # Load caret library(caret) set.seed(999) # Specify the resampling method as 10-fold CV fit_control <- trainControl(method = "cv", number = 10) cv_mf_fit <- train(y = data$yi, x = data[,c(3:13, 16)], method = ModelInfo_mf(), trControl = fit_control) # Cross-validated clustered MetaForest data <- get(data(dat.bourassa1996)) data <- escalc(measure = "OR", ai = lh.le, bi = lh.re, ci = rh.le, di= rh.re, data = data, add = 1/2, to = "all") data$mage[is.na(data$mage)] <- median(data$mage, na.rm = TRUE) data[c(5:8)] <- lapply(data[c(5:8)], factor) data$yi <- as.numeric(data$yi) # Set up 10-fold grouped CV fit_control <- trainControl(method = "cv", index = groupKFold(data$sample, k = 10)) # Set up a custom tuning grid for the three tuning parameters of MetaForest rf_grid <- expand.grid(whichweights = c("random", "fixed", "unif"), mtry = c(2, 4, 6), min.node.size = c(2, 4, 6)) # Train the model cv.mf.cluster <- train(y = data$yi, x = data[, c("selection", "investigator", "hand_assess", "eye_assess", "mage", "sex", "vi", "sample")], study = "sample", method = ModelInfo_mf(), trControl = fit_control, tuneGrid = rf_grid) ## End(Not run)
This function allows users to rely on the powerful caret
package for
cross-validating and tuning a rma analysis. Methods for rma are not included
in the caret package, because the interface of caret is not entirely
compatible with rma's model call. Specifically, rma is not compatible with
the train
methods for classes 'formula' or 'recipe'. The variance of
the effect sizes can be passed to the 'weights' parameter of train
.
ModelInfo_rma()
ModelInfo_rma()
When using clustered data (effect sizes within studies), make sure to use 'index = groupKFold(your_study_id_variable, k = 10))' in traincontrol, to sample by study ID when creating cross-validation partitions; otherwise the testing error will be positively biased.
ModelInfo list of length 13.
## Not run: # Prepare data dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat$yi <- as.numeric(dat$yi) dat$alloc <- factor(dat$alloc) # Run rma rma.model <- rma(y = dat$yi, mods = dat[, c("ablat", "year")], vi = dat$vi) # R^2 is estimated to be .64 rma.model$R2 # Now, use cross-validation to see how well this model generalizes # Leave-one-out cross-validation is more appropriate than 10-fold cv because # the sample size is very small fit_control <- trainControl(method = "LOOCV") # Train the model without tuning, because rma has no tuning parameters cv.mf.cluster <- train(y = dat$yi, x = dat[, c("ablat", "year")], weights = dat$vi, method = ModelInfo_rma(), trControl = fit_control) # Cross-validated R^2 is .08, suggesting substantial overfitting of the # original rma model cv.mf.cluster$results$Rsquared ## End(Not run)
## Not run: # Prepare data dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat$yi <- as.numeric(dat$yi) dat$alloc <- factor(dat$alloc) # Run rma rma.model <- rma(y = dat$yi, mods = dat[, c("ablat", "year")], vi = dat$vi) # R^2 is estimated to be .64 rma.model$R2 # Now, use cross-validation to see how well this model generalizes # Leave-one-out cross-validation is more appropriate than 10-fold cv because # the sample size is very small fit_control <- trainControl(method = "LOOCV") # Train the model without tuning, because rma has no tuning parameters cv.mf.cluster <- train(y = dat$yi, x = dat[, c("ablat", "year")], weights = dat$vi, method = ModelInfo_rma(), trControl = fit_control) # Cross-validated R^2 is .08, suggesting substantial overfitting of the # original rma model cv.mf.cluster$results$Rsquared ## End(Not run)
Partial dependence plots
PartialDependence( x, vars = NULL, pi = NULL, rawdata = FALSE, bw = FALSE, resolution = NULL, moderator = NULL, mod_levels = NULL, output = "plot", ... )
PartialDependence( x, vars = NULL, pi = NULL, rawdata = FALSE, bw = FALSE, resolution = NULL, moderator = NULL, mod_levels = NULL, output = "plot", ... )
x |
Model object. |
vars |
Character vector containing the moderator names for which to plot partial dependence plots. If empty, all moderators are plotted. |
pi |
Numeric (0-1). What percentile interval should be plotted for the
partial dependence predictions? Defaults to NULL. To obtain a 95% interval,
set to |
rawdata |
Logical, indicating whether to plot weighted raw data.
Defaults to FALSE. Uses the same weights as the model object passed to the
|
bw |
Logical, indicating whether the plot should be black and white, or color. |
resolution |
Integer vector of length two, giving the resolution of the partial predictions. The first element indicates the resolution of the partial predictions; for Monte-Carlo integration, the second element gives the number of rows of the data to be sampled without replacement when averaging over values of the other predictors. |
moderator |
Atomic character vector, referencing the name of one
variable in the model. Results in partial prediction plots, conditional on
the moderator. If |
mod_levels |
Vector. If |
output |
Character. What type of output should be returned? Defaults to
|
... |
Additional arguments to be passed to and from functions. |
Plots partial dependence plots (predicted effect size as a function of the
value of each predictor variable) for a MetaForest- or rma model object. For
rma models, it is advisable to mean-center numeric predictors, and to not
include plot_int effects, except when the rma model is bivariate, and the
plot_int
argument is set to TRUE
.
A gtable object.
## Not run: #' # Partial dependence plot for MetaForest() model: set.seed(42) data <- SimulateSMD(k_train = 200, model = es * x[, 1] + es * x[, 2] + es * x[, 1] * x[, 2])$training data$X2 <- cut(data$X2, breaks = 2, labels = c("Low", "High")) mf.random <- MetaForest(formula = yi ~ ., data = data, whichweights = "random", method = "DL", tau2 = 0.2450) # Examine univariate partial dependence plot for all variables in the model: PartialDependence(mf.random, pi = .8) # Examine bivariate partial dependence plot the plot_int between X1 and X2: pd.plot <- PartialDependence(mf.random, vars = c("X1", "X2"), plot_int = TRUE) # Save to pdf file pdf("pd_plot.pdf") grid.draw(pd.plot) dev.off() # Partial dependence plot for metafor rma() model: dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat$yi <- as.numeric(dat$yi) dat$alloc <- factor(dat$alloc) dat$ablat_d <- cut(dat$ablat, breaks = 2, labels = c("low", "high")) # Demonstrate partial dependence plot for a bivariate plot_int rma.model.int <- rma(yi, vi, mods=cbind(ablat, tpos), data=dat, method="REML") PartialDependence(rma.model.int, rawdata = TRUE, pi = .95, plot_int = TRUE) # Compare partial dependence for metaforest and rma dat2 <- dat dat2[3:7] <- lapply(dat2[3:7], function(x){as.numeric(scale(x, scale = FALSE))}) mf.model.all <- MetaForest(yi ~ ., dat2[, c(3:11)]) rma.model.all <- rma(dat$yi, dat2$vi, mods = model.matrix(yi~., dat2[, c(3:10)])[, -1], method="REML") PartialDependence(mf.model.all, rawdata = TRUE, pi = .95) PartialDependence(rma.model.all, rawdata = TRUE, pi = .95) ## End(Not run)
## Not run: #' # Partial dependence plot for MetaForest() model: set.seed(42) data <- SimulateSMD(k_train = 200, model = es * x[, 1] + es * x[, 2] + es * x[, 1] * x[, 2])$training data$X2 <- cut(data$X2, breaks = 2, labels = c("Low", "High")) mf.random <- MetaForest(formula = yi ~ ., data = data, whichweights = "random", method = "DL", tau2 = 0.2450) # Examine univariate partial dependence plot for all variables in the model: PartialDependence(mf.random, pi = .8) # Examine bivariate partial dependence plot the plot_int between X1 and X2: pd.plot <- PartialDependence(mf.random, vars = c("X1", "X2"), plot_int = TRUE) # Save to pdf file pdf("pd_plot.pdf") grid.draw(pd.plot) dev.off() # Partial dependence plot for metafor rma() model: dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat$yi <- as.numeric(dat$yi) dat$alloc <- factor(dat$alloc) dat$ablat_d <- cut(dat$ablat, breaks = 2, labels = c("low", "high")) # Demonstrate partial dependence plot for a bivariate plot_int rma.model.int <- rma(yi, vi, mods=cbind(ablat, tpos), data=dat, method="REML") PartialDependence(rma.model.int, rawdata = TRUE, pi = .95, plot_int = TRUE) # Compare partial dependence for metaforest and rma dat2 <- dat dat2[3:7] <- lapply(dat2[3:7], function(x){as.numeric(scale(x, scale = FALSE))}) mf.model.all <- MetaForest(yi ~ ., dat2[, c(3:11)]) rma.model.all <- rma(dat$yi, dat2$vi, mods = model.matrix(yi~., dat2[, c(3:10)])[, -1], method="REML") PartialDependence(mf.model.all, rawdata = TRUE, pi = .95) PartialDependence(rma.model.all, rawdata = TRUE, pi = .95) ## End(Not run)
Plots cumulative MSE for a MetaForest object.
## S3 method for class 'MetaForest' plot(x, y, ...)
## S3 method for class 'MetaForest' plot(x, y, ...)
x |
MetaForest object. |
y |
not used for plot.MetaForest |
... |
Arguments to be passed to methods, not used for plot.MetaForest |
A ggplot object, visualizing the number of trees on the x-axis, and the cumulative mean of the MSE of that number of trees on the y-axis. As a visual aid to assess convergence, a dashed gray line is plotted at the median cumulative MSE value.
MetaForest prediction
## S3 method for class 'MetaForest' predict(object, data = NULL, type = "response", ...)
## S3 method for class 'MetaForest' predict(object, data = NULL, type = "response", ...)
object |
|
data |
New test data of class |
type |
Type of prediction. One of 'response', 'se', 'terminalNodes' with default 'response'. See below for details. |
... |
further arguments passed to or from other methods. |
Object of class MetaForest.prediction
with elements
predictions |
Predicted classes/values (only for classification and regression) |
num.trees |
Number of trees. |
num.independent.variables |
Number of independent variables. |
treetype |
Type of forest/tree. Classification, regression or survival. |
num.samples |
Number of samples. |
set.seed(56) data <- SimulateSMD(k_train = 100, model = es * x[,1] * x[,2]) #Conduct fixed-effects MetaForest analysis mf.fixed <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "fixed", method = "DL") predicted <- predict(mf.fixed, data = data$testing)$predictions r2_cv <- sum((predicted - mean(data$training$yi)) ^ 2)/ sum((data$testing$yi - mean(data$training$yi)) ^ 2)
set.seed(56) data <- SimulateSMD(k_train = 100, model = es * x[,1] * x[,2]) #Conduct fixed-effects MetaForest analysis mf.fixed <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "fixed", method = "DL") predicted <- predict(mf.fixed, data = data$testing)$predictions r2_cv <- sum((predicted - mean(data$training$yi)) ^ 2)/ sum((data$testing$yi - mean(data$training$yi)) ^ 2)
Takes a MetaForest
object, and applies different
algorithms for variable selection.
preselect(x, replications = 100L, algorithm = "replicate", ...)
preselect(x, replications = 100L, algorithm = "replicate", ...)
x |
Model to perform variable selection for. Accepts MetaForest objects. |
replications |
Integer. Number of replications to run for variable preselection. Default: 100. |
algorithm |
Character. Preselection method to apply. Currently, 'replicate', 'recursive', and 'bootstrap' are available. |
... |
Other arguments to be passed to and from functions. |
Currently, available methods under algorithm
are:
This simply replicates the analysis, which means the forest has access to the full data set, but the trees are grown on different bootstrap samples across replications (thereby varying monte carlo error).
This replicates the analysis on bootstrapped samples, which
means each replication has access to a different sub-sample of the full data
set. When selecting this algorithm, cases are either bootstrap-sampled by
study
, or a new study
column is generated, and a clustered
MetaForest is grown (because some of the rows in the data will be duplicated)
, and this would lead to an under-estimation of the OOB error.
Starting with all moderators, the variable with the most negative variable importance is dropped from the model, and the analysis re-run. This is repeated until only variables with a positive variable importance are left, or no variables are left. The proportion of final models containing each variable reflects its importance.
An object of class 'mf_preselect'
## Not run: data <- get(data(dat.bourassa1996)) data <- escalc(measure = "OR", ai = lh.le, bi = lh.re, ci = rh.le, di= rh.re, data = data, add = 1/2, to = "all") data$mage[is.na(data$mage)] <- median(data$mage, na.rm = TRUE) data[c(5:8)] <- lapply(data[c(5:8)], factor) data$yi <- as.numeric(data$yi) mf.model <- MetaForest(formula = yi~ selection + investigator + hand_assess + eye_assess + mage +sex, data, study = "sample", whichweights = "unif", num.trees = 300) preselect(mf.model, replications = 10, algorithm = "bootstrap") ## End(Not run)
## Not run: data <- get(data(dat.bourassa1996)) data <- escalc(measure = "OR", ai = lh.le, bi = lh.re, ci = rh.le, di= rh.re, data = data, add = 1/2, to = "all") data$mage[is.na(data$mage)] <- median(data$mage, na.rm = TRUE) data[c(5:8)] <- lapply(data[c(5:8)], factor) data$yi <- as.numeric(data$yi) mf.model <- MetaForest(formula = yi~ selection + investigator + hand_assess + eye_assess + mage +sex, data, study = "sample", whichweights = "unif", num.trees = 300) preselect(mf.model, replications = 10, algorithm = "bootstrap") ## End(Not run)
Returns a vector of variable names from an mf_preselect object, based on a cutoff criterion provided.
preselect_vars(x, cutoff = NULL, criterion = NULL)
preselect_vars(x, cutoff = NULL, criterion = NULL)
x |
Object of class mf_preselect. |
cutoff |
Numeric. Must be a value between 0 and 1. By default, uses .95 for bootstrapped preselection, and .1 for recursive preselection. |
criterion |
Character. Which criterion to use. See |
For criterion = 'p'
, the function evaluates the proportion of
replications in which a variable achieved a positive (>0) variable
importance. For criterion = 'ci'
, the function evaluates whether the
lower bound of a confidence interval of a variable's importance across
replications exceeds zero. The width of the confidence interval is determined
by cutoff
.
For recursive preselection, any variable not included in a final model is assigned zero importance.
Character vector.
## Not run: data <- get(data(dat.bourassa1996)) data <- escalc(measure = "OR", ai = lh.le, bi = lh.re, ci = rh.le, di= rh.re, data = data, add = 1/2, to = "all") data$mage[is.na(data$mage)] <- median(data$mage, na.rm = TRUE) data[c(5:8)] <- lapply(data[c(5:8)], factor) data$yi <- as.numeric(data$yi) preselected <- preselect(formula = yi~ selection + investigator + hand_assess + eye_assess + mage +sex, data, study = "sample", whichweights = "unif", num.trees = 300, replications = 10, algorithm = "bootstrap") preselect_vars(preselected) ## End(Not run)
## Not run: data <- get(data(dat.bourassa1996)) data <- escalc(measure = "OR", ai = lh.le, bi = lh.re, ci = rh.le, di= rh.re, data = data, add = 1/2, to = "all") data$mage[is.na(data$mage)] <- median(data$mage, na.rm = TRUE) data[c(5:8)] <- lapply(data[c(5:8)], factor) data$yi <- as.numeric(data$yi) preselected <- preselect(formula = yi~ selection + investigator + hand_assess + eye_assess + mage +sex, data, study = "sample", whichweights = "unif", num.trees = 300, replications = 10, algorithm = "bootstrap") preselect_vars(preselected) ## End(Not run)
Prints summary.MetaForest object.
## S3 method for class 'summary.MetaForest' print(x, digits, ...)
## S3 method for class 'summary.MetaForest' print(x, digits, ...)
x |
an object used to select a method. |
digits |
minimal number of significant digits, see |
... |
further arguments passed to or from other methods. |
This function simulates a meta-analytic dataset based on the random-effects model. The simulated effect size is Hedges' G, an estimator of the Standardized Mean Difference. The functional form of the model can be specified, and moderators can be either normally distributed or Bernoulli-distributed. See Van Lissa, 2018, for a detailed explanation of the simulation procedure.
SimulateSMD( k_train = 20, k_test = 100, mean_n = 40, es = 0.5, tau2 = 0.04, moderators = 5, distribution = "normal", model = es * x[, 1] )
SimulateSMD( k_train = 20, k_test = 100, mean_n = 40, es = 0.5, tau2 = 0.04, moderators = 5, distribution = "normal", model = es * x[, 1] )
k_train |
Atomic integer. The number of studies in the training dataset. Defaults to 20. |
k_test |
Atomic integer. The number of studies in the testing dataset. Defaults to 100. |
mean_n |
Atomic integer. The mean sample size of each simulated study in the meta-analytic dataset. Defaults to 40. For each simulated study, the sample size n is randomly drawn from a normal distribution with mean mean_n, and sd mean_n/3. |
es |
Atomic numeric vector. The effect size, also known as beta, used in the model statement. Defaults to .5. |
tau2 |
Atomic numeric vector. The residual heterogeneity. Defaults to 0.04. |
moderators |
Atomic integer. The number of moderators to simulate for each study. Make sure that the number of moderators to be simulated is at least as large as the number of moderators referred to in the model parameter. Internally, the matrix of moderators is referred to as "x". Defaults to 5. |
distribution |
Atomic character. The distribution of the moderators. Can be set to either "normal" or "bernoulli". Defaults to "normal". |
model |
Expression. An expression to specify the model from which to
simulate the mean true effect size, mu. This formula may use the terms "es"
(referring to the es parameter of the call to SimulateSMD), and "x[, ]"
(referring to the matrix of moderators, x). Thus, to specify that the mean
effect size, mu, is a function of the effect size and the first moderator,
one would pass the value |
List of length 4. The "training" element of this list is a data.frame with k_train rows. The columns are the variance of the effect size, vi; the effect size, yi, and the moderators, X. The "testing" element of this list is a data.frame with k_test rows. The columns are the effect size, yi, and the moderators, X. The "housekeeping" element of this list is a data.frame with k_train + k_test rows. The columns are n, the sample size n for each simulated study; mu_i, the mean true effect size for each simulated study; and theta_i, the true effect size for each simulated study.
Van Lissa, C. J. (2020). Small sample meta-analyses: exploring heterogeneity using metaForest. In R. Van De Schoot & M. Miočević (Eds.), Small sample size solutions (open access): A guide for applied researchers and practitioners. CRC Press (pp.186–202). doi:10.4324/9780429273872-16 Van Lissa, C. J. (2018). MetaForest: Exploring heterogeneity in meta-analysis using random forests. PsyArxiv. doi:10.31234/osf.io/myg6s
set.seed(8) SimulateSMD() SimulateSMD(k_train = 50, distribution = "bernoulli") SimulateSMD(distribution = "bernoulli", model = es * x[ ,1] * x[ ,2])
set.seed(8) SimulateSMD() SimulateSMD(k_train = 50, distribution = "bernoulli") SimulateSMD(distribution = "bernoulli", model = es * x[ ,1] * x[ ,2])
Plots variable importance for a MetaForest object.
VarImpPlot(mf, n.var = 30, sort = TRUE, ...)
VarImpPlot(mf, n.var = 30, sort = TRUE, ...)
mf |
MetaForest object. |
n.var |
Number of moderators to plot. |
sort |
Should the moderators be sorted from most to least important? |
... |
Parameters passed to and from other functions. |
A ggplot object.
set.seed(42) data <- SimulateSMD() mf.random <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "random", method = "DL", tau2 = 0.0116) VarImpPlot(mf.random) VarImpPlot(mf.random, n.var = 2) VarImpPlot(mf.random, sort = FALSE)
set.seed(42) data <- SimulateSMD() mf.random <- MetaForest(formula = yi ~ ., data = data$training, whichweights = "random", method = "DL", tau2 = 0.0116) VarImpPlot(mf.random) VarImpPlot(mf.random, n.var = 2) VarImpPlot(mf.random, sort = FALSE)
Plots weighted scatterplots for meta-analytic data. Can plot effect size as a function of either continuous (numeric, integer) or categorical (factor, character) predictors.
WeightedScatter( data, yi = "yi", vi = "vi", vars = NULL, tau2 = NULL, summarize = TRUE )
WeightedScatter( data, yi = "yi", vi = "vi", vars = NULL, tau2 = NULL, summarize = TRUE )
data |
A data.frame. |
yi |
Character. The name of the column in |
vi |
Character. The name of the column in the |
vars |
Character vector containing the names of specific moderator
variables to plot. When set to |
tau2 |
Numeric. Provide an optional value for tau2. If this value is provided, random-effects weights will be used instead of fixed-effects weights. |
summarize |
Logical. Should summary stats be displayed? Defaults to FALSE. If TRUE, a smooth trend line is displayed for continuous variables, using [stats::loess()] for less than 1000 observations, and [mgcv::gam()] for larger datasets. For categorical variables, box-and-whiskers plots are displayed. Outliers are omitted, because the raw data fulfill this function. |
A gtable object.
## Not run: set.seed(42) data <- SimulateSMD(k_train = 100, model = es * x[, 1] + es * x[, 2] + es * x[, 1] * x[, 2])$training data$X2 <- cut(data$X2, breaks = 2, labels = c("Low", "High")) data$X3 <- cut(data$X3, breaks = 2, labels = c("Small", "Big")) WeightedScatter(data, summarize = FALSE) WeightedScatter(data, vars = c("X3")) WeightedScatter(data, vars = c("X1", "X3")) ## End(Not run)
## Not run: set.seed(42) data <- SimulateSMD(k_train = 100, model = es * x[, 1] + es * x[, 2] + es * x[, 1] * x[, 2])$training data$X2 <- cut(data$X2, breaks = 2, labels = c("Low", "High")) data$X3 <- cut(data$X3, breaks = 2, labels = c("Small", "Big")) WeightedScatter(data, summarize = FALSE) WeightedScatter(data, vars = c("X3")) WeightedScatter(data, vars = c("X1", "X3")) ## End(Not run)