source("resources/chapter 18/functions.r")
10 Visualization and interpretation of individualized treatment rule results
In this tutorial, we will walk you through the code that implemented the precision medicine methods and generated the visualization results discussed in Chapter 18 of the book. This tutorial focuses more on helping you understand the code. We will not provide detailed interpretation of the results as they have been covered in the chapter already.
10.1 Introduction
We first load all relevant functions for this chapter.
Subsequently, we use the function simcountdata()
to generate an example dataset with a sample size of N=2000. In this example, we have two disease modifying therapies (DMT1 and DMT0) and the outcome is the number of post-treatment multiple sclerosis relapses during follow-up.
# Randomization seed
<- 999
base.seed
set.seed(base.seed)
<- simcountdata(n = 2000,
df.ori seed = 63,
beta = c(log(0.4), log(0.5), log(1), log(1.1), log(1.2)),
beta.x = c(-1.54, -0.01, 0.06, 0.25, 0.5, 0.13, 0.0000003)
$data )
Thank you for using fastDummies!
To acknowledge our work, please cite the package:
Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.
The dataset looks as follows:
head(df.ori)
trt ageatindex_centered female prerelapse_num prevDMTefficacy premedicalcost
1 0 2 0 2 Low efficacy 4606.04
2 1 10 1 1 Low efficacy 17065.19
3 1 12 1 2 None 6308.39
4 1 -12 0 0 Low efficacy 16633.97
5 1 13 1 0 Low efficacy 642.96
6 1 14 1 0 Low efficacy 2989.89
numSymptoms postrelapse_num finalpostdayscount group score Iscore
1 0 1 305 Simulated 0.7129792 1
2 1 0 367 Simulated 0.7404238 2
3 0 0 325 Simulated 0.7564233 3
4 0 0 321 Simulated 0.7215764 1
5 0 0 24 Simulated 0.7457823 2
6 0 0 59 Simulated 0.7441632 2
Below is a summary table of the baseline characteristics by treatment group.
0 (N=506) |
1 (N=1494) |
Overall (N=2000) |
|
---|---|---|---|
Age (years) | |||
Mean (SD) | 45.2 (9.82) | 45.8 (9.73) | 45.7 (9.75) |
Median [Min, Max] | 46.0 [20.0, 64.0] | 46.0 [19.0, 64.0] | 46.0 [19.0, 64.0] |
Gender | |||
female | 375 (74.1%) | 1123 (75.2%) | 1498 (74.9%) |
male | 131 (25.9%) | 371 (24.8%) | 502 (25.1%) |
Previous number of relapses | |||
0 | 319 (63.0%) | 973 (65.1%) | 1292 (64.6%) |
1 | 150 (29.6%) | 427 (28.6%) | 577 (28.9%) |
2 | 31 (6.1%) | 76 (5.1%) | 107 (5.4%) |
3 | 5 (1.0%) | 17 (1.1%) | 22 (1.1%) |
4 | 1 (0.2%) | 1 (0.1%) | 2 (0.1%) |
Efficacy of previous disease modifying therapy | |||
Low efficacy | 216 (42.7%) | 609 (40.8%) | 825 (41.3%) |
Medium and high efficacy | 53 (10.5%) | 179 (12.0%) | 232 (11.6%) |
None | 237 (46.8%) | 706 (47.3%) | 943 (47.2%) |
Previous medical cost (\$) | |||
Mean (SD) | 13700 (20400) | 14400 (24500) | 14300 (23600) |
Median [Min, Max] | 7320 [343, 264000] | 7560 [110, 556000] | 7470 [110, 556000] |
Previous number of symptoms | |||
0 | 348 (68.8%) | 995 (66.6%) | 1343 (67.2%) |
1 | 119 (23.5%) | 388 (26.0%) | 507 (25.4%) |
>=2 | 39 (7.7%) | 111 (7.4%) | 150 (7.5%) |
We now define key constants for the case study.
# Baseline characteristics
<- c("age.z", "female", "prevtrtB", "prevtrtC", "prevnumsymp1",
covars "prevnumsymp2p", "previous_cost.z", "previous_number_relapses")
# Precision medicine methods to be used
<- c("all1", "all0", "poisson", "dWOLS", "listDTR2",
pm.methods "contrastReg")
# Precision medicine method labels
<- c("All 0", "All 1", "Poisson", "dWOLS",
method.vec "Contrast\n Regression", "List DTR\n (2 branches)")
# Number of folds in each CV iteration
<- 5
n.fold
# Number of CV iterations
<- 10
n.cv
# Sample size of the large independent test set to get true value
<- 100000
big.n
# Define formula for the CATE model
<- as.formula(paste0("y ~", paste0(covars, collapse = "+"),
cate.formula "+ offset(log(years))"))
# Define formula for the propensity score model
<- trt ~ age.z + prevtrtB + prevtrtC
ps.formula
# Color
<- rgb(37, 15, 186, maxColorValue = 255)
myblue <- rgb(109, 173, 70, maxColorValue = 255)
mygreen <- rgb(124, 135, 142, maxColorValue = 255) mygrey
The data need to be preprocessed to be more analyzable. We recategorized treatment, previous treatment, and number of symptoms; scaled medical cost and age; and standardized the data.
<- df.ori %>%
df rename(previous_treatment = prevDMTefficacy,
age = ageatindex_centered,
y = postrelapse_num,
previous_number_relapses = prerelapse_num,
previous_number_symptoms = numSymptoms,
previous_cost = premedicalcost) %>%
mutate(previous_treatment = factor(previous_treatment,
levels = c("None",
"Low efficacy",
"Medium and high efficacy"),
labels = c("drugA", "drugB", "drugC")),
previous_number_symptoms = factor(previous_number_symptoms,
levels = c("0", "1", ">=2"),
labels = c("0", "1", ">=2")),
trt = factor(trt, levels = c(0, 1), labels = c("drug0", "drug1")),
previous_cost.z = scale(log(previous_cost), scale = TRUE),
age.z = age + 48,
age.z = scale(age.z, scale = TRUE),
years = finalpostdayscount / 365.25,
mlogarr0001 = -log(y / years + 0.001),
drug1 = as.numeric(trt == "drug1"),
prevtrtB = as.numeric(previous_treatment == "drugB"),
prevtrtC = as.numeric(previous_treatment == "drugC"),
prevnumsymp1 = as.numeric(previous_number_symptoms == "1"),
prevnumsymp2p = as.numeric(previous_number_symptoms == ">=2")) %>%
::select(age.z, female, contains("prevtrt"), previous_cost.z,
dplyrcontains("prevnumsymp"),
previous_number_relapses, trt, drug1, y,
mlogarr0001, years, Iscore)
# Standardize data
<- c("age.z", "previous_cost.z")
z.labels <- df
df.s setdiff(covars, z.labels)] <- df[, setdiff(covars, z.labels)] df.s[,
10.2 Estmition of individualized treatment rules
The following code provides details of how to implement the precision medicine methods in the example data. Please feel free to jump to the next section if you want to focus on the results. The model results are available online for you to load and save time.
We used the function listdtr()
in the listdtr package to estimate individualized treatment rules (ITRs) based on the listDTR method. We used the function catefit()
in the precmed package to estimate ITRs based on the Poisson and contrast regression method. These were the methods used in Section 3 of the book where we talked about directly visualizing the ITR before bringing in the outcomes. The methods are discussed in further detail by Zhao et al. (2013) and Yadlowsky et al. (2020).
library(listdtr)
# Estimated ITR based on the listDTR method with 2 branches
<- listdtr(y = df$mlogarr, # larger is more favorable
modlist2 a = df$drug1,
x = df[, c("age.z", "female", "prevtrtB",
"prevtrtC", "previous_cost.z",
"prevnumsymp1", "prevnumsymp2p",
"previous_number_relapses")],
stage.x = rep(1, 8), maxlen = 2L) # somewhat slow
# Estimated ITR based on the listDTR method with 3 branches
<- listdtr(y = df$mlogarr,
modlist3 a = df$drug1,
x = df[, c("age.z", "female", "prevtrtB",
"prevtrtC", "previous_cost.z",
"prevnumsymp1", "prevnumsymp2p",
"previous_number_relapses")],
stage.x = rep(1, 8), maxlen = 3L) # somewhat slow
# Estimated CATE score based on the Poisson and contrast regression
<- catefit(response = "count",
modpm cate.model = cate.formula,
ps.model = ps.formula,
data = df,
higher.y = FALSE,
score.method = c("poisson", "contrastReg"),
initial.predictor.method = "poisson",
seed = 999)
# Estimated CATE score based on the Poisson and contrast regression
# (based on the scaled data so the coefficients are easier to compare)
<- catefit(response = "count",
modpm.s cate.model = cate.formula,
ps.model = ps.formula,
data = df.s,
higher.y = FALSE,
score.method = c("poisson", "contrastReg"),
initial.predictor.method = "poisson",
seed = 999)
For results in Sections 4 and 5, we applied cross validation to mitigate over-fitting. For this chapter, we created our own customized function cvvalue()
to estimate the ITR and calculate the estimated value function via cross validation for all methods, including the fixed method. The results were all saved under the prefix cvmod
. The precmed package has a built-in cross validation procedure for CATE estimation so we used the function catefit()
.
# Run cross validation for each method (used for Sections 4 & 5)
## Estimated CATE scores based on the Poisson and contrast regression with cross-validation
<- catecv(response = "count",
modcv cate.model = cate.formula,
ps.model = ps.formula,
data = df,
higher.y = FALSE,
score.method = c("poisson", "contrastReg"),
initial.predictor.method = "poisson",
cv.n = n.cv,
plot.gbmperf = FALSE,
seed = 999) # somewhat slow
## Estimated value function for each method
<- cvvalue(data = df, xvar = covars,
cvmodall0 method = "all0", n.fold = n.fold, n.cv = n.cv,
seed = base.seed)
<- cvvalue(data = df, xvar = covars,
cvmodall1 method = "all1", n.fold = n.fold, n.cv = n.cv,
seed = base.seed)
<- cvvalue(data = df, xvar = covars,
cvmoddwols method = "dWOLS", n.fold = n.fold, n.cv = n.cv,
seed = base.seed)
<- cvvalue(data = df, xvar = covars,
cvmodpois method = "poisson", n.fold = n.fold, n.cv = n.cv,
seed = base.seed)
<- cvvalue(data = df, xvar = covars,
cvmodlist2 method = "listDTR2", n.fold = n.fold, n.cv = n.cv,
seed = base.seed) # very slow
<- cvvalue(data = df, xvar = covars,
cvmodcontrastreg method = "contrastReg", n.fold = n.fold,
n.cv = n.cv,
seed = base.seed) # very slow
As a next step, we need to combine all estimated ITRs and value functions:
# Combine CV results
# Read in each CV result in a loop
<- dhats <- NULL
vhats.dhat <- c("cvmodall1", "cvmodall0", "cvmoddwols", "cvmodpois", "cvmodcontrastreg", "cvmodlist2")
mod_names for (mod in mod_names) {
<- get(mod)
thismod for (name in names(thismod)) {
# Get estimated values, vhat.dhat
<- rbind(vhats.dhat,
vhats.dhat %>%
thismod[[name]] map_df(~bind_rows(names(.x) %>% str_detect("vhat.dhat") %>% keep(.x, .)), .id = "fold") %>%
mutate(method = mod, cv.i = name))
# Get estimated rule from CV test fold, dhat
<- rbind(dhats,
dhats %>%
thismod[[name]] map_df(~bind_rows(names(.x) %>% str_detect("^dhat$") %>% keep(.x, .)), .id = "fold") %>%
mutate(method = mod, cv.i = name))
}
}
# One time run to get true optimal and worst value
# Simulated data only
<- getTrueOptimalValue(n = big.n, seed = base.seed)
trueV <- getTrueWorstValue(n = big.n, seed = base.seed)
trueWorstV
# Preprocess
%<>%
vhats.dhat mutate(V = U/W,
VR = (U/W - trueWorstV) / (trueV - trueWorstV)) %>%
group_by(method) %>%
summarize(n.batches = n(),
n.nonnaU = sum(!is.na(U)),
n.nonnaW = sum(!is.na(W)),
meanV = mean(V, na.rm = T),
sdV = sd(V, na.rm = T),
meanVR = mean(VR, na.rm = T),
sdVR = sd(VR, na.rm = T),
.groups = "keep") %>%
%>%
ungroup arrange(desc(meanV)) %>%
mutate(method = case_when(
== "cvmodcontrastreg" ~ "Contrast\n Regression",
method == "cvmodall0" ~ "All 0",
method == "cvmodall1" ~ "All 1",
method == "cvmodlist2" ~ "List DTR\n (2 branches)",
method == "cvmoddwols" ~ "dWOLS",
method == "cvmodpois" ~ "Poisson"),
method method = factor(method,
levels = method.vec,
labels = method.vec)
)
%<>%
dhats mutate(method = case_when(
== "cvmodcontrastreg" ~ "Contrast\n Regression",
method == "cvmodall0" ~ "All 0",
method == "cvmodall1" ~ "All 1",
method == "cvmodlist2" ~ "List DTR\n (2 branches)",
method == "cvmoddwols" ~ "dWOLS",
method == "cvmodpois" ~ "Poisson"),
method method = factor(method,
levels = method.vec,
labels = method.vec)
)
10.3 Visualization of individualized treatment rules
10.3.1 Direct visualization
10.3.1.1 listDTR
If the PM method already has built-in visualization (especially for tree-based methods), we can visualize the ITR directly. For example, we can simply use the function plot()
to visualize the estimated ITR with the listDTR method.
#modlist3 %>% plot()
We can also create our own visualization like Figure 1A in the chapter.
<- df %>%
df.list3 mutate(d.list = ifelse(age.z > 0.599 | prevtrtB > 0.5, "Recommend 1", "Recommend 0"), # based on modlist3
Rule = factor(as.character(d.list), levels = c("Recommend 0", "Recommend 1")),
prevtrtB = ifelse(prevtrtB == 1, "Previous treatment is drug B", "Previous treatment is not drug B")
)
## Figure 1A
%>%
df.list3 ggplot(aes(x = age.z, fill = Rule))+
geom_histogram(position = position_dodge2(preserve = 'single'), binwidth = 0.1)+
facet_wrap(~ prevtrtB, nrow = 2) +
scale_fill_brewer(palette = "Set1") +
labs(x = "Standardized age", y = "Count") +
theme_classic() +
theme(legend.position = 'top', text = element_text(size = 20))
The subgroup-level annualized relapse rate (ARR) can be calculated based on the listDTR ITR:
%>%
df.list3 group_by(trt, d.list) %>%
summarise(ARR = round(sum(y) / sum(years), 2),
n = n(),
`prop%` = round(n / nrow(df), 2)*100, .groups = "drop") %>%
rename("listDTR ITR" = d.list,
"Observed treatment" = trt) %>%
kable() %>%
kable_styling(full_width = F)
Observed treatment | listDTR ITR | ARR | n | prop% |
---|---|---|---|---|
drug0 | Recommend 0 | 0.32 | 197 | 10 |
drug0 | Recommend 1 | 0.31 | 309 | 15 |
drug1 | Recommend 0 | 0.39 | 615 | 31 |
drug1 | Recommend 1 | 0.16 | 879 | 44 |
Patients who received drug 0 and were recommended drug 0 by listDTR had a similar ARR on average than those who received drug 0 but were recommended drug 1 (0.32 vs 0.31). Patients who received drug 1 and were recommended drug 1 by listDTR had a much lower ARR on average than those who received drug 1 but were recommended drug 0 (0.16 vs 0.39).
10.3.1.2 Score-based method
Although some PM methods do not have built-in visualization or not as “white-box” as some more interpretable methods, there still might be ways to visualize the ITR. For example, score-based methods (such as Poisson and contrast regression) produce an estimate of the CATE score for each patient, and a classification tree can be fitted on these scores and visualized. Below is a histogram-density plot of the CATE scores estimated from the Poisson regression and the fitted classification tree using the estimated CATE scores. We pruned the tree so it only had three nodes for simplicity. The rpart.plot
package has a built-in visualization function of the rpart
model, rpart.plot()
, which is how Figure 1B in the chapter was generated.
"score.poisson"] <- modpm$score.poisson
df[
ggplot(df, aes(x = score.poisson)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "lightblue") +
geom_density(alpha = .2, fill = "white") +
labs(x = "Estimated CATE score from the Poisson regression", y = "Density") +
theme_classic()
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
<- rpart(as.formula(paste0("score.poisson ~", paste0(covars, collapse = "+"))),
modtree method = "anova", data = df, control = rpart.control(minsplit = 100, cp = 0.01)) # Fit Poisson CATE scores on a classification tree
<- prune(modtree, cp = 0.09) # I ended up choosing a higher cp value to have only 3 subgroups
modtree.pr
# print(rpart.rules(modtree.pr, cover = TRUE))
## Figure 1B
rpart.plot(modtree.pr, box.palette = "RdBu", type = 5, under = TRUE, tweak = 1.2, compress = TRUE)
The CATE scores are now simplified as a tree classifier. Previous treatment of drug B and age seemed to be important in determining the CATE score values, which also showed up in the estimated from the listDTR method. Patients with previous treatment of drug B had the lowest CATE score on average (-0.76) and took up 41% of the samples (dark orange). Patients whose previous treatment was not drug B and age was >= -0.22 standard deviation of the mean also had a negative CATE score on average (-0.23) and took up 36% of the samples (light orange), but not as low as the dark orange group. Negative CATE scores mean that the number of relapses was expected to be lower for those recommended drug 1 than those recommended drug 0, so drug 1 was favored for them. For the blue group, the average CATE score was 0.13, taking up 23% of the samples, and they were expected to benefit from drug 0 based on the Poisson CATE scores.
10.3.2 ITR accuracy
The accuracy of ITR is the proportion of patients whose estimated ITR is the same as the true optimal ITR. The estimated ITRs have been obtained from the PM methods but we need to calculate the true optimal ITR. This is only possible for simulated data where the decision boundary is known. Based on the data generating mechanism in simcountdata()
, Iscore
is a score generated from a linear combination of baseline covariates where lower scores represented that drug 1 was better and higher scores represented that drug 0 was better. We then classified patients in 5 equal-size subgroups based on the Iscore
, where groups 1 and 2 have drug 1 as their true optimal ITR and groups 3 and 4 have drug 0 as their true optimal ITR. Group 3 is considered the neutral group, where patients are indifferent to either drug so we assign the true optimal ITR to be their observed treatment. Thus, we identify the true optimal ITR for every patient based on this subgrouping, which was derived from their true score Iscore
. Since we used cross validation in estimating the ITR, we need to apply the exact same cross validation to the true optimal ITR. This is achieved by specifying the same randomization seed in the cross validation loop (see seed
).
## Create new columns
$d <- rep(NA, nrow(dhats)) # true d
dhats
# Identify the true optimal treatment
# See simcountdata() in the function script to learn more about Iscore
<- df %>%
sim mutate(trueT = ifelse(as.numeric(Iscore) < 3, 1, 0),
trueT = ifelse(Iscore == 3, drug1, trueT)) # neutral group
# Format data
<- data.frame(y = sim$y,
input trt = sim$drug1,
time = log(sim$years),
sim[covars])
# Cross validation loop
for (i in unique(dhats$cv.i)) {
<- base.seed*100 + as.numeric(str_extract(i, "[0-9]+"))
seed set.seed(seed)
# Create CV folds
<- createFolds(input$trt, k = n.fold, list = TRUE)
folds
for (fold.i in seq(n.fold)) {
<- sim[folds[[fold.i]],]
testdata # number of methods which succeeded for the given fold/batch.
<- which(dhats$fold == paste0("fold", fold.i) &
ind_result $cv.i == i &
dhats!is.na(dhats$dhat))
<- length(ind_result)
nr
$d[ind_result] <- rep(testdata$trueT, nr/nrow(testdata))
dhatsstopifnot(nr %% nrow(testdata) == 0)
}# end of all cv iterations }
Once we identified the true optimal ITR (\(d^{opt}\)), we can calculate the accuracy in each validation fold for each PM method (\(\hat{d}_{pm}\)). Mathematically, accuracy can be expressed as \[Accuracy_{pm}(\boldsymbol{x}^{val}) = \frac{1}{n^{val}}\sum_{i = 1}^{n^{val}} I\big(\hat{d}_{pm}(\boldsymbol{x}_i^{val}) == d^{opt}(\boldsymbol{x}_i^{val})\big),\] where \(n^{val}\) is the sample size in the validation fold, \(\boldsymbol{x}_i^{val}\) is the baseline characteristics of the \(i\)th patient in the validation fold, and \(pm\) stands for one PM method.
Below is how Figure 2 in the chapter was generated. It summarized the accuracy across all validation folds as a box plot so we can also learn the variability of accuracy across folds.
##### Accuracy #####
## Calculate % accuracy for each iteration & summary statistics
<- dhats %>%
dhats.accuracy group_by(method, cv.i, fold) %>%
summarise(accuracy = sum(dhat == d)/n(), .groups = "drop") %>%
ungroup
## Make the accuracy plot, Figure 2
%>%
dhats.accuracy ggplot(aes(x = method, y = accuracy)) +
geom_boxplot() +
geom_hline(yintercept = 1, linetype = 2, linewidth = 1, color = "gray") +
geom_hline(yintercept = 0.5, linetype = 2, linewidth = 1, color = "gray") +
theme_classic() +
labs(x = "Method", y = "Accuracy") +
theme(axis.text = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.title.x = element_text(size = 15),
axis.text.x = element_text(angle = 0, size = 15),
strip.text.x = element_text(size = 15))
10.3.3 ITR agreement
When we do not know the true data generating mechanism, e.g., real-world data, we cannot compare the estimated ITR with the true optimal ITR. However, we can compare the estimated ITR with another estimated ITR, and this is called agreement. Agreement is the proportion of patients whose estimated ITR of a method is the same as the estimated ITR of another method. Thus, agreement is between two methods. Mathematically, \[Agreement_{1, 2}(\boldsymbol{x}^{val}) = \frac{1}{n^{val}} \sum_{i = 1}^{n^{val}} I\big( \hat{d}_{1}(\boldsymbol{x}^{val}) == \hat{d}_{2} (\boldsymbol{x}^{val}) \big), \] where \(n^{val}\) is the sample size in the validation fold, \(\boldsymbol{x}_i^{val}\) is the baseline characteristics of the \(i\)th patient in the validation fold, and \(1, 2\) stands for method 1 and method 2.
##### Agreement #####
<- dhats %>%
dhats.concat arrange(cv.i, fold, method) %>%
mutate(iteration.fold = (as.numeric(str_extract(cv.i, "[0-9]+")) - 1) * 10 + as.numeric(str_extract(fold, "[0-9]+"))) %>%
::select(method, iteration.fold, dhat) %>%
dplyrgroup_by(method, iteration.fold) %>%
mutate(i = 1:n()) %>%
ungroup
<- length(method.vec)
m <- matrix(nrow = m, ncol = m)
dhats.agreement colnames(dhats.agreement) <- method.vec
rownames(dhats.agreement) <- method.vec
for(k in seq_len(m)){
for(j in seq(k, m)){
<- dhats.concat %>% filter(method == method.vec[k])
data.k <- dhats.concat %>% filter(method == method.vec[j])
data.j <- data.k %>% full_join(data.j, by = c("iteration.fold", "i"))
data.jk <- dhats.agreement[j, k] <- sum(data.jk$dhat.x == data.jk$dhat.y, na.rm = T) / sum(is.na(data.jk$dhat.x) == FALSE & is.na(data.jk$dhat.y) == FALSE)
dhats.agreement[k, j]
}
}
# Make the agreement plot, Figure 3
corrplot(dhats.agreement, method = "color", type = "lower",
addCoef.col = "orange", number.cex = 1.5,
tl.cex = 1.2, cl.cex = 1.2, tl.col = "black", tl.srt = 0, tl.offset = 1.5)
We used the corrplot
package to generate Figure 3 in the chapter but agreement can be visualized in other creative ways that you prefer.
10.4 Patient well-being
Patient well-being is evaluated via the value function, which is defined as the expected outcome had they followed the specified ITR. Like a fortune teller’s crystal ball, this metric tells us how well the patients would do on average under each ITR. We can then compare across different ITRs and identify an optimal ITR. Cross validation is necessary here to mitigate over-fitting, and we visualized the value function results as error bar plots. The mean and standard deviation of the value functions have been preprocessed previously. We use ggplot()
to generate the error bar plots. Figure 4A is the original value function estimates, and Figure 4B is the standardized value ratio estimates, which convert value functions to a ratio where 1 is always more desirable.
##### Errorbar plot #####
# Figure 4A
<- vhats.dhat %>%
p4a ggplot(aes(x = method, y = meanV)) +
geom_point(size = 8, shape = 16, color = "navy") +
geom_errorbar(aes(ymin = meanV - sdV, ymax = meanV + sdV), width = 0.3, size = 2, position = position_dodge(0.9), color = "navy") +
theme_classic() + xlab("") + ylab("Cross-validated value (mean +- SD)") +
theme(axis.text = element_text(size = 15), axis.title.y = element_text(size = 15)) +
geom_hline(yintercept = vhats.dhat$meanV[which(vhats.dhat$method == "All 1")], linetype = 2, size = 1.5, color = "gray")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
##### Value ratio #####
# Figure 4B
<- vhats.dhat %>%
p4b ::select(method, contains("VR"), n.nonnaU) %>%
dplyrggplot(aes(x = method, y = meanVR)) +
geom_point(size = 8, color = "navy") +
geom_errorbar(aes(ymin = meanVR - sdVR, ymax = meanVR + sdVR), width = 0.3, size = 2, position = position_dodge(0.9), color = "navy") +
geom_hline(yintercept = 1, color = "gray", linetype = 2, size = 1) +
geom_hline(yintercept = 0, color = "gray", linetype = 2, size = 1) +
scale_y_continuous(breaks = seq(0, 1, length = 6)) +
theme_classic() +
labs(x = "", y = "Value ratio of cross-validated estimated decision rule (mean +- SD)") +
theme(axis.text = element_text(size = 13),
axis.title.y = element_text(size = 15),
axis.title.x = element_text(size = 15),
axis.text.x = element_text(size = 15),
strip.text.x = element_text(size = 12))
# Figure 4
ggarrange(p4a, p4b, ncol = 2, nrow = 1, labels = c("A", "B"))
10.5 Responder diagnostics
10.5.1 Validation
The package we used for the two score-based methods (Poisson and contrast regression), PrecMed
, has built-in visualization tools to diagnose the results: validation box plots boxplot()
, validation curves plot()
, and area between curves (ABC) statistics abc()
.
##### Validation of ITR scores #####
# Figure 5A
<- boxplot(modcv, ylab = "Rate ratio between T=1 and T=0 in each subgroup")
p5a # Figure 5B
<- plot(modcv, ylab = "Rate ratio between T=1 and T=0 in each subgroup")
p5b
# Figure 5
ggarrange(p5a, p5b, ncol = 1, nrow = 2, labels = c("A", "B"))
The PrecMed
package has more PM methods implemented other than Poisson and contrast regression that you can try, such as negative binomial and two regressions. See its documentation for more details.
10.5.2 Univariate comparision of patient characteristics
The 60/40 cutoff was used in the chapter to split patients into “high responders” and “standard responders”. The function CreateTableOne()
in the tableone
package was used to generate a table comparing side-by-side the baseline characteristics between the two responder groups.
##### Side-by-side baseline characteristic comparison between responder subgroups #####
<- quantile(modpm$score.poisson, 0.6) # 60/40 high vs standard responder split
cutoff "responder to T=1"] <- ifelse(modpm$score.poisson < cutoff, "High", "Standard")
df["age.z"] <- as.numeric(df$age.z)
df["previous_cost.z"] <- as.numeric(df$previous_cost.z)
df[
<- list("age.z" = "Standardized baseline age", "female" = "Female",
labs "prevtrtB" = "Previous treatment drug B", "prevtrtC" = "Previous treatment drug C",
"prevnumsymp1" = "Previous number of symptoms == 1",
"prevnumsymp2p" = "Previuos number of symptoms >= 2",
"previous_cost.z" = "Standardized previous medical cost\n(excluding medication)",
"previos_number_relapses" = "Previous number of relapses")
<- CreateTableOne(vars = covars, strata = "responder to T=1", data = df, test = F) %>% print(smd = T) tab
Stratified by responder to T=1
High Standard SMD
n 1200 800
age.z (mean (SD)) 0.29 (0.94) -0.44 (0.92) 0.789
female (mean (SD)) 0.69 (0.46) 0.84 (0.36) 0.384
prevtrtB (mean (SD)) 0.67 (0.47) 0.02 (0.15) 1.867
prevtrtC (mean (SD)) 0.02 (0.13) 0.26 (0.44) 0.750
prevnumsymp1 (mean (SD)) 0.32 (0.47) 0.15 (0.35) 0.431
prevnumsymp2p (mean (SD)) 0.05 (0.22) 0.11 (0.31) 0.208
previous_cost.z (mean (SD)) -0.08 (1.00) 0.12 (0.99) 0.203
previous_number_relapses (mean (SD)) 0.41 (0.64) 0.47 (0.68) 0.104
We can directly present the table or visualize the comparison with errorbar plots, which is what the chapter presented (Figure 6A). Here we show both. Tables are helpful if specific numbers are important but readers would have to perform mental comparison to understand which value is higher, whereas plots are helpful if you want people to quickly identify the larger differences and not focus on the specific values of certain results.
<- as_tibble(tab, rownames = "var") %>%
smd rowwise() %>%
mutate(variable = as.factor(str_extract(var, ".*(?= \\(mean \\(SD\\)\\))"))) %>%
filter(!is.na(variable)) %>%
arrange(desc(SMD)) %>%
mutate(smd = paste0("SMD =", SMD),
variable = labs[[variable]]) %>%
::select(variable, smd) %>%
dplyrmutate(ID = 1)
<- unique(smd$variable)
levels
<- df %>%
p6a mutate(ID = 1:n()) %>%
::select(all_of(covars), ID, contains("responder")) %>%
dplyrmelt(id = c("ID", "responder to T=1")) %>%
rowwise() %>%
mutate(variable = labs[[variable]]) %>%
left_join(smd, by = c("variable", "ID")) %>%
mutate(variable2 = factor(variable, levels = levels)) %>%
ggplot(aes(x = reorder(variable2, desc(variable2)), color = `responder to T=1`, y = value, group = `responder to T=1`)) +
stat_summary(fun = mean, geom = "point", size = 4, position = position_dodge(width = 0.5)) +
stat_summary(fun.data = mean_sdl, geom = "errorbar", position = position_dodge(width = 0.5), width= 0.3, size = 1.2) +
geom_hline(yintercept = 0, color = "gray", linetype = "dashed") +
geom_text(aes(label = smd), hjust = -0.5, y = -1.5, color = "darkgray", size = 3.5) +
# facet_wrap(~ variable2, nrow = 4) +
labs(x = "Baseline patient characteristic",
y = "Mean +- SD") +
coord_flip() +
scale_color_brewer(palette = "Set2") +
theme_classic() +
theme(legend.position = "top",
axis.text = element_text(size = 13),
axis.title.y = element_text(size = 15),
axis.title.x = element_text(size = 15),
axis.text.x = element_text(size = 15),
strip.text.x = element_text(size = 12))
# Figure 6A
ggarrange(p6a, nrow = 1, labels = c("A"))
We can also show the density of ITR scores obtained from the score-based methods. The results can be found in modpm
and we used histogram to visualize (Figure 6B).
##### Density of ITR score #####
<- data.frame(score = factor(rep(c("Naive Poisson", "Contrast Regression"),
dataplot each = length(modpm$score.poisson))),
value = c(modpm$score.poisson, modpm$score.contrastReg))
<- dataplot %>%
p6b ggplot(aes(x = value, fill = score)) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("dodgerblue", "gray30")) +
geom_vline(xintercept = 0, color = "darkgray", linetype = "dashed", size = 1) +
labs(x = "Estimated CATE score", y = "Density", fill = "Method") +
theme_classic() +
theme(legend.position = c(0.2, 0.8),
axis.text = element_text(size = 13),
axis.title.y = element_text(size = 15),
axis.title.x = element_text(size = 15),
axis.text.x = element_text(size = 15),
strip.text.x = element_text(size = 12))
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
The ITR scores are essentially a linear combination of the baseline characteristics, thus it might be also of interest for one to know the corresponding coefficients (or weights) which shows how much each baseline variable contributed to the ITR score. To make it comparable across different scales of the baseline variables, we used the scaled data and the model result modpm.s
was used to extract the coefficients and visualize as a bar plot. The coefficients can be presented in a table as well.
# Coefficients
<- modpm.s$coefficients
coef
<- coef %>%
p6c as_tibble(rownames = "varname") %>%
melt(id.vars = "varname") %>%
filter(variable == "poisson", varname != "(Intercept)") %>%
mutate(absval = abs(value),
sign = ifelse(value > 0, "+", "-")) %>%
arrange(absval) %>%
mutate(varname = factor(varname, levels = unique(varname))) %>%
ggplot(aes(x = varname, y = absval, fill = sign)) +
geom_bar(stat = "identity", width = 0.5) +
scale_fill_brewer(palette = "Set1") +
scale_x_discrete(labels = labs) +
coord_flip() +
labs(y = "Absolute value of the estimated coefficient of CATE scores\nbased on Poisson regression", x = "Baseline patient characteristic") +
theme_minimal() +
theme(legend.position = c(0.8, 0.2),
axis.text = element_text(size = 13),
axis.title.y = element_text(size = 15),
axis.title.x = element_text(size = 15),
axis.text.x = element_text(size = 15),
strip.text.x = element_text(size = 12))
# Figure 6B, 6C
ggarrange(p6b, p6c, nrow = 2, labels = c("B", "C"))
# Coefficients presented as a table
%>% round(2) %>% kable() %>% kable_styling(full_width = F) coef
poisson | contrastReg | SE_contrastReg | |
---|---|---|---|
(Intercept) | -0.30 | -0.25 | 0.46 |
age.z | -0.21 | -0.23 | 0.17 |
female | 0.20 | 0.29 | 0.43 |
prevtrtB | -0.63 | -0.86 | 0.36 |
prevtrtC | 0.19 | 0.21 | 0.54 |
prevnumsymp1 | -0.14 | -0.42 | 0.39 |
prevnumsymp2p | 0.20 | 1.05 | 0.58 |
previous_cost.z | 0.06 | 0.13 | 0.18 |
previous_number_relapses | 0.08 | 0.26 | 0.23 |
Version info
This chapter was rendered using the following version of R and its packages:
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fastDummies_1.7.3 reshape2_1.4.4 truncnorm_1.0-9 table1_1.4.3
[5] kableExtra_1.4.0 knitr_1.45 ggpubr_0.6.0 MASS_7.3-58.2
[9] corrplot_0.92 caret_6.0-94 lattice_0.20-45 gbm_2.1.9
[13] tableone_0.13.2 rpart.plot_3.1.2 rpart_4.1.19 precmed_1.0.0
[17] DTRreg_2.0 magrittr_2.0.3 lubridate_1.9.3 forcats_1.0.0
[21] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[25] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 ggsignif_0.6.4 class_7.3-21
[4] ggridges_0.5.6 proxy_0.4-27 rstudioapi_0.15.0
[7] farver_2.1.1 listenv_0.9.1 prodlim_2023.08.28
[10] fansi_1.0.6 xml2_1.3.6 codetools_0.2-19
[13] splines_4.2.3 Formula_1.2-5 jsonlite_1.8.8
[16] pROC_1.18.5 broom_1.0.5 geepack_1.3.10
[19] data.tree_1.1.0 DiagrammeR_1.0.11 clipr_0.8.0
[22] compiler_4.2.3 randomForestSRC_3.2.3 backports_1.4.1
[25] Matrix_1.6-5 fastmap_1.1.1 survey_4.4-1
[28] cli_3.6.2 visNetwork_2.1.2 htmltools_0.5.7
[31] tools_4.2.3 gtable_0.3.4 glue_1.7.0
[34] MESS_0.5.12 Rcpp_1.0.12 carData_3.0-5
[37] vctrs_0.6.5 svglite_2.1.3 nlme_3.1-162
[40] iterators_1.0.14 timeDate_4032.109 gower_1.0.1
[43] xfun_0.42 globals_0.16.3 timechange_0.3.0
[46] lifecycle_1.0.4 geeM_0.10.1 mosaicCore_0.9.4.0
[49] rstatix_0.7.2 future_1.33.1 scales_1.3.0
[52] ipred_0.9-14 hms_1.1.3 parallel_4.2.3
[55] RColorBrewer_1.1-3 yaml_2.3.8 labelled_2.12.0
[58] gam_1.22-3 stringi_1.8.3 highr_0.10
[61] foreach_1.5.2 e1071_1.7-14 hardhat_1.3.1
[64] lava_1.8.0 shape_1.4.6.1 systemfonts_1.0.6
[67] rlang_1.1.3 pkgconfig_2.0.3 evaluate_0.23
[70] labeling_0.4.3 recipes_1.0.10 htmlwidgets_1.6.4
[73] cowplot_1.1.3 tidyselect_1.2.1 parallelly_1.37.1
[76] ggformula_0.12.0 plyr_1.8.9 R6_2.5.1
[79] generics_0.1.3 DBI_1.2.2 pillar_1.9.0
[82] haven_2.5.4 withr_3.0.0 survival_3.5-3
[85] abind_1.4-5 nnet_7.3-18 future.apply_1.11.1
[88] car_3.1-2 utf8_1.2.4 tzdb_0.4.0
[91] rmarkdown_2.26 grid_4.2.3 data.table_1.15.2
[94] ModelMetrics_1.2.2.2 digest_0.6.35 stats4_4.2.3
[97] munsell_0.5.0 glmnet_4.1-8 viridisLite_0.4.2
[100] mitools_2.4