Stochastic treatment regimes present a relatively simple manner in which to assess the effects of continuous treatments by way of parameters that examine the effects induced by the counterfactual shifting of the observed values of a treatment of interest. Here, we present an implementation of a new algorithm for computing targeted minimum loss-based estimates of treatment shift parameters defined based on a shifting function d(A, W). For a technical presentation of the algorithm, the interested reader is invited to consult Dı́az and van der Laan (2018). For additional background on Targeted Learning and previous work on stochastic treatment regimes, please consider consulting van der Laan and Rose (2011), van der Laan and Rose (2018), and Dı́az and van der Laan (2012).
To start, let’s load the packages we’ll use and set a seed for simulation:
Consider n observed units O1, …, On, where each random variable O = (W, A, Y) corresponds to a single observational unit. Let W denote baseline covariates (e.g., age, sex, education level), A an intervention variable of interest (e.g., nutritional supplements), and Y an outcome of interest (e.g., disease status). Though it need not be the case, let A be continuous-valued, i.e. A ∈ ℝ. Let Oi ∼ 𝒫 ∈ ℳ, where ℳ is the nonparametric statistical model defined as the set of continuous densities on O with respect to some dominating measure. To formalize the definition of stochastic interventions and their corresponding causal effects, we introduce a nonparametric structural equation model (NPSEM), based on Pearl (2000), to define how the system changes under posited interventions: We denote the observed data structure O = (W, A, Y)
Letting A denote a continuous-valued treatment, we assume that the distribution of A conditional on W = w has support in the interval (l(w), u(w)) – for convenience, let this support be a.e. That is, the minimum natural value of treatment A for an individual with covariates W = w is l(w); similarly, the maximum is u(w). Then, a simple stochastic intervention, based on a shift δ, may be defined where 0 ≤ δ ≤ u(w) is an arbitrary pre-specified value that defines the degree to which the observed value A is to be shifted, where possible.
# simulate simple data for tmle-shift sketch
n_obs <- 1000 # number of observations
n_w <- 1 # number of baseline covariates
tx_mult <- 2 # multiplier for the effect of W = 1 on the treatment
# baseline covariates -- simple, binary
W <- as.numeric(replicate(n_w, rbinom(n_obs, 1, 0.5)))
# create treatment based on baseline W
A <- as.numeric(rnorm(n_obs, mean = tx_mult * W, sd = 1))
# create outcome as a linear function of A, W + white noise
Y <- A + W + rnorm(n_obs, mean = 0, sd = 0.5)
The above composes our observed data structure O = (W, A, Y).
To formally express this fact using the tlverse
introduced by the tmle3
we create a single data object and specify the functional relationships
between the nodes in the directed acyclic graph (DAG) via
nonparametric structural equation models (NPSEMs), reflected in
the node list that we set up:
# organize data and nodes for tmle3
data <- data.table(W, A, Y)
node_list <- list(W = "W", A = "A", Y = "Y")
## W A Y
## <num> <num> <num>
## 1: 1 2.4031607 3.7157578
## 2: 1 4.4973744 5.9651611
## 3: 1 2.0330871 2.2531970
## 4: 0 -0.8089023 -0.8849531
## 5: 1 1.8432067 2.7193091
## 6: 1 1.3555863 2.5705832
We now have an observed data structure (data
) and a
specification of the role that each variable in the data set plays as
the nodes in a DAG.
In order to specify a grid of shifts δ to be used in defining a set of stochastic intervention policies in an a priori manner, let us consider an arbitrary scalar δ that defines a counterfactual outcome ψn = Qn(d(A, W), W), where, for simplicity, let d(A, W) = A + δ. A simplified expression of the auxiliary covariate for the TML estimator of ψ is $H_n = \frac{g^{\star}(a \mid w)}{g(a \mid w)}$, where g⋆(a ∣ w) defines the treatment mechanism with the stochastic intervention implemented. To ascertain whether a given choice of the shift δ is admissable – that is, whether such an intervention may be implemented while avoiding violations of the positivity assumption – define a bound $C(\delta) := \frac{g^{\star}(a \mid w)}{g(a \mid w)} \leq M$, where g⋆(a ∣ w) is a function of δ in part, and M is a potentially user-specified upper bound of C(δ). Then, C(δ) may be interpreted as a measure of the influence of a given observation providing a way to limit the maximum influence of a given observation through a choice of the shift δ and the setting of the bound M.
We formalize and extend the procedure to determine an acceptable set
of values for the shift δ in
the sequel. Specifically, let there be a shift d(a, w) = a + δ,
where the shift δ is defined
The above provides a strategy for implementing a shift at the level
of a given observation (a, w), thereby allowing
for all observations to be shifted to an appropriate value – whether
δmin, δ, or δmax. For the purpose of
using such a shift in practice, the present software provides the
functions shift_additive_bounded
, which define a variation of
this shift: which corresponds to an intervention in which the natural
value of treatment of a given observational unit is shifted by a value
δ in the case that the ratio
of the intervened density g⋆(a ∣ w)
to the natural density g(a ∣ w) (that is,
C(δ)) does not exceed
a bound M. In the case that
the ratio C(δ)
exceeds the bound M, the
stochastic intervention policy does not apply to the given unit and they
remain at their natural value of treatment a.
To easily incorporate ensemble machine learning into the estimation
procedure, we rely on the facilities provided in the sl3
R package. For a
complete guide on using the sl3
R package, consider
consulting, or for the
of which sl3
is a core engine.
Using the framework provided by the sl3
package, the
nuisance parameters of the TML estimator may be fit with ensemble
learning, using the cross-validation framework of the Super Learner
algorithm of van der Laan, Polley, and Hubbard
(2007). To estimate the treatment mechanism (often denoted “g” in
the targeted learning literature), we must make use of learning
algorithms specifically suited to conditional density estimation; a list
of such learners may be extracted from sl3
by using
## [1] "Lrnr_density_discretize" "Lrnr_density_hse"
## [3] "Lrnr_density_semiparametric" "Lrnr_haldensify"
## [5] "Lrnr_solnp_density"
To proceed, we’ll select two of the above learners,
for using the highly adaptive lasso for
conditional density estimation, based on an algorithm given by Dı́az and van der Laan (2011), and
, an approach for semiparametric
conditional density estimation:
# learners used for conditional density regression (i.e., propensity score)
haldensify_lrnr <- Lrnr_haldensify$new(
n_bins = 3, grid_type = "equal_mass",
lambda_seq = exp(seq(-1, -9, length = 100))
hse_lrnr <- Lrnr_density_semiparametric$new(mean_learner = Lrnr_glm$new())
mvd_lrnr <- Lrnr_density_semiparametric$new(mean_learner = Lrnr_glm$new(),
var_learner = Lrnr_mean$new())
sl_lrn_dens <- Lrnr_sl$new(
learners = list(haldensify_lrnr, hse_lrnr, mvd_lrnr),
metalearner = Lrnr_solnp_density$new()
We also required an approach for estimating the outcome regression (often denoted “Q” in the targeted learning literature). For this, we build a Super Learner composed of an intercept model, a GLM, and the xgboost algorithm for gradient boosting:
# learners used for conditional expectation regression (e.g., outcome)
mean_lrnr <- Lrnr_mean$new()
glm_lrnr <- Lrnr_glm$new()
xgb_lrnr <- Lrnr_xgboost$new()
sl_lrn <- Lrnr_sl$new(
learners = list(mean_lrnr, glm_lrnr, xgb_lrnr),
metalearner = Lrnr_nnls$new()
We can make the above explicit with respect to standard notation by
bundling the ensemble learners into a list
# specify outcome and treatment regressions and create learner list
Q_learner <- sl_lrn
g_learner <- sl_lrn_dens
learner_list <- list(Y = Q_learner, A = g_learner)
The learner_list
object above specifies the role that
each of the ensemble learners we’ve generated is to play in computing
initial estimators to be used in building a TMLE for the parameter of
interest here. In particular, it makes explicit the fact that our
is used in fitting the outcome regression while
our g_learner
is used in fitting our treatment mechanism
through its
To start, we will initialize a specification for the TMLE of our
parameter of interest (called a tmle3_Spec
in the
nomenclature) simply by calling
. We specify the argument
shift_grid = seq(-1, 1, by = 1)
when initializing the
object to communicate that we’re interested in
assessing the mean counterfactual outcome over a grid of shifts -1, 0, 1
on the scale of the treatment A (note that the numerical choice of
shift is an arbitrarily chosen set of values for this example).
# what's the grid of shifts we wish to consider?
delta_grid <- seq(-1, 1, 1)
# initialize a tmle specification
tmle_spec <- tmle_vimshift_delta(shift_fxn = shift_additive_bounded,
shift_fxn_inv = shift_additive_bounded_inv,
shift_grid = delta_grid,
max_shifted_ratio = 2)
As seen above, the tmle_vimshift
specification object
(like all tmle3_Spec
objects) does not store the
data for our specific analysis of interest. Later, we’ll see that
passing a data object directly to the tmle3
function, alongside the instantiated tmle_spec
, will serve
to construct a tmle3_Task
object internally (see the
documentation for details).
One may walk through the step-by-step procedure for fitting the TML
estimator of the mean counterfactual outcome under each shift in the
grid, using the machinery exposed by the tmle3
R package (see
below); however, the step-by-step procedure is more often not of
# define data (from tmle3_Spec base class)
tmle_task <- tmle_spec$make_tmle_task(data, node_list)
# define likelihood (from tmle3_Spec base class)
likelihood_init <- tmle_spec$make_initial_likelihood(tmle_task, learner_list)
# define update method (fluctuation submodel and loss function)
updater <- tmle_spec$make_updater()
likelihood_targeted <- Targeted_Likelihood$new(likelihood_init, updater)
# invoke params specified in spec
tmle_params <- tmle_spec$make_params(tmle_task, likelihood_targeted)
updater$tmle_params <- tmle_params
# fit TML estimator update
tmle_fit <- fit_tmle3(tmle_task, likelihood_targeted, tmle_params, updater)
# extract results from tmle3_Fit object
Instead, one may invoke the tmle3
wrapper function (a
user-facing convenience utility) to fit the series of TML estimators
(one for each parameter defined by the grid delta) in a single function
## A tmle3_Fit that took 1 step(s)
## type param init_est tmle_est se lower upper
## <char> <char> <num> <num> <num> <num> <num>
## 1: TSM E[Y_{A=NULL}] 0.5647509 0.5737838 0.05873221 0.4586708 0.6888968
## 2: TSM E[Y_{A=NULL}] 1.5436331 1.5436275 0.05987214 1.4262803 1.6609748
## 3: TSM E[Y_{A=NULL}] 2.5582052 2.5596756 0.05973218 2.4426027 2.6767486
## 4: MSM_linear MSM(intercept) 1.5555297 1.5590290 0.05917357 1.4430509 1.6750071
## 5: MSM_linear MSM(slope) 0.9967272 0.9929459 0.00648008 0.9802452 1.0056466
## psi_transformed lower_transformed upper_transformed
## <num> <num> <num>
## 1: 0.5737838 0.4586708 0.6888968
## 2: 1.5436275 1.4262803 1.6609748
## 3: 2.5596756 2.4426027 2.6767486
## 4: 1.5590290 1.4430509 1.6750071
## 5: 0.9929459 0.9802452 1.0056466
Remark: The print
method of the resultant
object conveniently displays the results from
computing our TML estimator.
In the directly preceding section, we consider estimating the mean counterfactual outcome ψn under several values of the intervention δ, taken from the aforementioned δ-grid. We now turn our attention to an approach for obtaining inference on a single summary measure of these estimated quantities. In particular, we propose summarizing the estimates ψn through a marginal structural model (MSM), obtaining inference by way of a hypothesis test on a parameter of this working MSM. For a data structure O = (W, A, Y), let ψδ(P0) be the mean outcome under a shift δ of the treatment, so that we have ψ⃗δ = (ψδ : δ) with corresponding estimators ψ⃗n, δ = (ψn, δ : δ). Further, let β(ψ⃗δ) = ϕ((ψδ : δ)).
For a given MSM mβ(δ),
we have that β0 = argminβ∑δ(ψδ(P0) − mβ(δ))2h(δ),
which is the solution to
Now, say, ψ⃗ = (ψ(δ) : δ)
is d-dimensional, then we may write the efficient influence function of
the MSM parameter β (assuming
a linear MSM) as follows
In an effort to generalize still further, consider the case where
ψδ(P0) ∈ (0, 1)
– that is, ψδ(P0)
corresponds to the probability of some event of interest. In such a
case, it would be more natural to consider a logistic MSM
Inference from a working MSM is rather straightforward. To wit, the
limiting distribution for mβ(δ)
may be expressed
Note that in the above, a working MSM is fit to the individual TML estimates of the mean counterfactual outcome under a given value of the shift δ in the supplied grid. The parameter of interest β of the MSM is asymptotically linear (and, in fact, a TML estimator) as a consequence of its construction from individual TML estimators. In smaller samples, it may be prudent to perform a TML estimation procedure that targets the parameter β directly, as opposed to constructing it from several independently targeted TML estimates. An approach for constructing such an estimator is proposed in the sequel.
Let $C = \left(\sum_{\delta} h(\delta)
\frac{d}{d\beta} m_{\beta}(\delta)
\frac{d}{d\beta} m_{\beta}(\delta)^t \right)$, then
Suppose a simple working MSM 𝔼Ygδ0 = β0 + β1δ,
then a TML estimator targeting β0 and β1 may be constructed as
To construct a targeted maximum likelihood estimator that directly
targets the parameters of the working marginal structural model, we may
use the tmle_vimshift_msm
Spec (instead of the
Spec that appears above):
# what's the grid of shifts we wish to consider?
delta_grid <- seq(-1, 1, 1)
# initialize a tmle specification
tmle_msm_spec <- tmle_vimshift_msm(shift_fxn = shift_additive_bounded,
shift_fxn_inv = shift_additive_bounded_inv,
shift_grid = delta_grid,
max_shifted_ratio = 2)
# fit the TML estimator and examine the results
tmle_msm_fit <- tmle3(tmle_msm_spec, data, node_list, learner_list)
## 2% of observations outside training support...predictions trimmed.
## Iter: 1 fn: 1384.1240 Pars: 0.39096 0.29352 0.31552
## Iter: 2 fn: 1384.1240 Pars: 0.39096 0.29352 0.31552
## solnp--> Completed in 2 iterations
## 0.7% of observations outside training support...predictions trimmed.
## 3.3% of observations outside training support...predictions trimmed.
## 0.7% of observations outside training support...predictions trimmed.
## 3.3% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## A tmle3_Fit that took 100 step(s)
## type param init_est tmle_est se lower upper
## <char> <char> <num> <num> <num> <num> <num>
## 1: MSM_linear MSM(intercept) 1.553775 1.5540594 0.059376693 1.4376832 1.670436
## 2: MSM_linear MSM(slope) 0.999536 0.9995316 0.006167658 0.9874432 1.011620
## psi_transformed lower_transformed upper_transformed
## <num> <num> <num>
## 1: 1.5540594 1.4376832 1.670436
## 2: 0.9995316 0.9874432 1.011620