论文标题
学会推迟多个专家:一致的替代损失,信心校准和同步合奏
Learning to Defer to Multiple Experts: Consistent Surrogate Losses, Confidence Calibration, and Conformal Ensembles
论文作者
论文摘要
我们研究学习推迟(L2D)的统计特性。特别是,我们解决了始终如一的替代损失,信心校准以及专家的原则结合的开放问题。首先,我们得出了两个一致的替代物 - 一个基于软马克斯参数化,另一个基于一个单VS-ALL(OVA)参数化 - 分别类似于Mozannar和Sontag(2020)以及Verma和Nalisnick(2022)提出的单个专家损失。然后,我们研究框架估计P(M_J = Y | X)的能力,即JTH专家将正确预测X标签的可能性。理论表明,基于软马克斯的损失导致错误的校准在估计值之间传播,而基于OVA的损失则没有(尽管实际上,我们发现有贸易折扣)。最后,我们提出了一种保形推理技术,该技术选择一部分专家在系统辩护时查询。我们对银河系,皮肤病变和仇恨言语分类的任务进行经验验证。
We study the statistical properties of learning to defer (L2D) to multiple experts. In particular, we address the open problems of deriving a consistent surrogate loss, confidence calibration, and principled ensembling of experts. Firstly, we derive two consistent surrogates -- one based on a softmax parameterization, the other on a one-vs-all (OvA) parameterization -- that are analogous to the single expert losses proposed by Mozannar and Sontag (2020) and Verma and Nalisnick (2022), respectively. We then study the frameworks' ability to estimate P( m_j = y | x ), the probability that the jth expert will correctly predict the label for x. Theory shows the softmax-based loss causes mis-calibration to propagate between the estimates while the OvA-based loss does not (though in practice, we find there are trade offs). Lastly, we propose a conformal inference technique that chooses a subset of experts to query when the system defers. We perform empirical validation on tasks for galaxy, skin lesion, and hate speech classification.
