mRSC
Autor: | Muhammad Jehangir Amjad, Devavrat Shah, Dennis Shen, Vishal Misra |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
Generalization Inference 010501 environmental sciences computer.software_genre Missing data 01 natural sciences 010104 statistics & probability Causal inference Metric (mathematics) Consistent estimator Data mining Noise (video) 0101 mathematics Time series computer 0105 earth and related environmental sciences |
Zdroj: | SIGMETRICS (Abstracts) |
DOI: | 10.1145/3309697.3331507 |
Popis: | When evaluating the impact of a policy (e.g., gun control) on a metric of interest (e.g., crime-rate), it may not be possible or feasible to conduct a randomized control trial. In such settings where only observational data is available, synthetic control (SC) methods \citeabadie1, abadie2, abadie3 provide a popular data-driven approach to estimate a "synthetic'' or "virtual'' control by combining measurements of "similar'' alternatives or units (called "donors'').Recently, robust synthetic control (RSC) \citersc1 was proposed as a generalization of SC to overcome the challenges of missing data and high levels of noise, while removing the reliance on expert domain knowledge for selecting donors. However, both SC and RSC (and its variants) suffer from poor estimation when the pre-intervention period is too short. As the main contribution of this work, we propose a generalization of unidimensional RSC to multi-dimensional Robust Synthetic Control, mRSC. Our proposed mechanism, mRSC, incorporates multiple types of measurements (or metrics) in addition to the measurement of interest for estimating a synthetic control, thus overcoming the challenge of poor inference due to limited amounts of pre-intervention data. We show that the mRSC algorithm, when using K relevant metrics, leads to a consistent estimator of the synthetic control for the target unit of interest under any metric. Our finite-sample analysis suggests that the mean-squared error (MSE) of our predictions decays to zero at a rate faster than the RSC algorithm by a factor of K and $\sqrtK $ for the training (pre-intervention) and testing (post-intervention) periods, respectively. Additionally, we propose a principled scheme to combine multiple metrics of interest via a diagnostic test that evaluates if adding a metric can be expected to result in improved inference.Our mechanism for validating mRSC performance is also an important and related contribution of this work: time series prediction. We propose a method to predict the future evolution of a time series based on limited data when the notion of time is relative and not absolute, i.e., where we have access to a donor pool that has already undergone the desired future evolution.We conduct extensive experimentation to establish the efficacy of mRSC in three different scenarios: predicting the evolution of a metric of interest using synthetically generated data from a known factor model, and forecasting weekly sales and score trajectories of a Walmart store and Cricket game, respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |