Skip to main content

Bayesian nonparametric generative models for causal inference with missing at random covariates

Jason Roy, Kirsten J. Lum, Bret Zeldow , Jordan D. Dworkin, Vincent Lo Re III, and Michael J. Daniels

We propose a general Bayesian nonparametric (BNP) approach to causal inference in the point treatment setting. The joint distribution of the observed data (outcome, treatment, and confounders) is modeled using an enriched Dirichlet process. The combination of the observed data model and causal assumptions allows us to identify any type of causal effect—differences, ratios, or quantile effects, either marginally or for subpopulations of interest. The proposed BNP model is well‐suited for causal inference problems, as it does not require parametric assumptions about the distribution of confounders and naturally leads to a computationally efficient Gibbs sampling algorithm. By flexibly modeling the joint distribution, we are also able to impute (via data augmentation) values for missing covariates within the algorithm under an assumption of ignorable missingness, obviating the need to create separate imputed data sets. This approach for imputing the missing covariates has the additional advantage of guaranteeing congeniality between the imputation model and the analysis model, and because we use a BNP approach, parametric models are avoided for imputation. The performance of the method is assessed using simulation studies. The method is applied to data from a cohort study of human immunodeficiency virus/hepatitis C virus co‐infected patients.

Link