论文标题
随机森林的本地多标签解释
Local Multi-Label Explanations for Random Forest
论文作者
论文摘要
多标签分类是一项具有挑战性的任务,尤其是在要预测的标签数量很大的域中。深度神经网络通常在图像和文本数据的多标签分类方面有效。但是,在处理表格数据时,传统的机器学习算法(例如树形合奏)似乎超出了竞争。作为一种流行的合奏算法,随机森林已在各种现实世界中发现使用。这些问题包括金融领域的欺诈检测,法律部门的犯罪热点检测以及在生物医学领域,可访问患者记录时的疾病概率预测。由于它们会对人们的生活产生影响,因此这些领域通常需要可以解释决策系统。随机森林缺乏该特性,尤其是当使用大量树预测变量时。该问题在最近的一项名为Lionforests的研究中解决了有关单标签分类和回归。在这项工作中,我们通过对解释所涵盖的标签采用三种不同的策略来使该技术适应多标签分类问题。最后,我们提供了一组定性和定量实验,以评估这种方法的功效。
Multi-label classification is a challenging task, particularly in domains where the number of labels to be predicted is large. Deep neural networks are often effective at multi-label classification of images and textual data. When dealing with tabular data, however, conventional machine learning algorithms, such as tree ensembles, appear to outperform competition. Random forest, being a popular ensemble algorithm, has found use in a wide range of real-world problems. Such problems include fraud detection in the financial domain, crime hotspot detection in the legal sector, and in the biomedical field, disease probability prediction when patient records are accessible. Since they have an impact on people's lives, these domains usually require decision-making systems to be explainable. Random Forest falls short on this property, especially when a large number of tree predictors are used. This issue was addressed in a recent research named LionForests, regarding single label classification and regression. In this work, we adapt this technique to multi-label classification problems, by employing three different strategies regarding the labels that the explanation covers. Finally, we provide a set of qualitative and quantitative experiments to assess the efficacy of this approach.
