Multivariate and Propensity Score Matching Software for Causal Inference

Jasjeet S. Sekhon

This website is for the distribution of "Matching" which is a R package for estimating causal effects by multivariate and propensity score matching. The package provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate tests to determine if balance has been obtained are also provided. These tests can also be used to determine if an experiment or quasi-experiment is balanced on baseline covariates.

For an introduction to the package with documentation and examples, please see "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Journal of Statistical Software, 42(7): 1-52. 2011. And the following two papers provides examples where GenMatch() is able to recover experimental benchmarks using observational data: "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies" and "A New Non-Parametric Matching Method for Bias Adjustment with Applications to Economic Evaluations".

Match() is the fastest multivariate and propensity score matching function I know of. Maximum speed is achieved when one uses the replace=FALSE and/or ties=FALSE options---see the Match() help for details. But the most reliable estimates are obtained with the defaults settings: replace=TRUE and ties=TRUE. GenMatch() supports the use of multiple computers, CPUs or cores to perform parallel computations. Examples are provided for how to use multiple chips on the same computer to perform parallel computations. Examples are also provided for how to use multiple computers to perform parallel calculations. A Change Log is available which tracks changes across versions.

The easiest way to install the latest version is to type in a R session:
> install.packages("Matching", dependencies=TRUE)

Also, make sure that the latest version of rgenoud is also installed:
> install.packages("rgenoud")

Alternatively, the package may be directly downloaded from CRAN.

The package includes the following main user exposed functions, two replication datasets and three demos:
GenMatch(): finds optimal balance using multivariate matching where a genetic search algorithm determines the weight each covariate is given. The user can choose which function of covariate balance to optimize from a list or provide one of her own.

Match(): performs multivariate and propensity score matching.

MatchBalance(): provides a variety of univariate and multivariate tests to determine if balance exists.

Matchby(): This function is a wrapper for the Match() function which separates the matching problem into subgroups defined by a factor. This function is much faster for large datasets than the Match() function itself.

qqstats()
ks.boot()
balanceUV()
Gerber, Green and Imai data
LaLonde data
AbadieImbens demo
DehejiaWahba demo
GerberGreenImai demo
Examples of how to use multiple chips on the same computer to perform parallel computations
Examples of how to use multiple computers to perform parallel calculations

The package is under active development so please check back for updates. Please cite the software as follows:
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Journal of Statistical Software. 42(7): 1-52.

GenMatch() can make use of multiple chips on the same computer or multiple computers to perform parallel computations. Examples are provided for how to use multiple chips on the same computer. Examples are also provided for how to use multiple computers to perform parallel computations in the Journal of Statistical Software article.

The following paper describes GenMatch() in detail and discusses its theoretical properties: "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies." Monte Carlo experiments are presented in the paper which illustrate GenMatch's properties, and real data examples are provided where GenMatch recovers the experimental bench. Also see the paper entitled "A New Non-Parametric Matching Method for Bias Adjustment with Applications to Economic Evaluations," where GenMatch is used to recover another experimental benchmark.

Also see my "Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference" paper which critically reviews various ways to measure balance. Cumulative probability distribution functions of standardized statistics are advocated as balance metrics. Formal hypothesis tests of balance should not be conducted as is common in the matching literature because no measure of balance is a monotonic function of bias and because balance should be optimized without limit. However, descriptive measures of discrepancy ignore information related to bias which is captured by probability distribution functions of standardized statistics. The rbounds package by Luke Keele implements a number of Rosenbaum's methods of sensitivity analysis for matched data. One can conduct sensitivity analyses for matched data with binary, ordinal or continuous outcomes, and for matched data with multiple control units matched to each treated unit. The package is designed work with the object returned by the Match() function.

Significant performance enhancements were provided by Nate Begeman (Mac OS X Performance Group at Apple). And "Matching" relies on a modified version of the Scythe Statistical Library developed by Andrew Martin, Kevin Quinn and Daniel Pemstein. My modified version of the library is included in the "Matching" package.

Return to Jasjeet Sekhon's Homepage