论文标题
通过有效的数据聚合进行治疗效应估计
Treatment Effect Estimation with Efficient Data Aggregation
论文作者
论文摘要
数据聚集(也称为元分析)被广泛用于结合多个研究之间共享参数(例如平均治疗效应)的知识。在本文中,我们引入了一种有吸引力的数据聚合方案,该方案汇总了各种现有研究的汇总统计数据。我们的计划为新验证研究的设计提供了信息,并为我们提供了共享参数的公正估计器。在我们的设置中,每项现有研究都采用套索回归来从大量协变量中选择一个简约的模型。众所周知,在选定模型中,事后估计量往往会偏差。我们表明,一种称为\ textIt {数据雕刻}的新型技术通过汇总所有现有研究的简单汇总统计数据,从而为我们提供了一个新的无偏估计器。我们的估计器具有两个关键特征:(a)我们从所有研究中充分利用数据,而没有模型选择偏见的风险; (b)我们享受单个数据隐私的额外好处,因为这些研究的原始数据无需共享或存储以进行有效的估计。
Data aggregation, also known as meta analysis, is widely used to combine knowledge on parameters shared in common (e.g., average treatment effect) between multiple studies. In this paper, we introduce an attractive data aggregation scheme that pools summary statistics from various existing studies. Our scheme informs the design of new validation studies and yields us unbiased estimators for the shared parameters. In our setup, each existing study applies a LASSO regression to select a parsimonious model from a large set of covariates. It is well known that post-hoc estimators, in the selected model, tend to be biased. We show that a novel technique called \textit{data carving} yields us a new unbiased estimator by aggregating simple summary statistics from all existing studies. Our estimator has two key features: (a) we make the fullest possible use of data, from all studies, without the risk of bias from model selection; (b) we enjoy the added benefit of individual data privacy, because raw data from these studies need not be shared or stored for efficient estimation.
