论文标题
实体标记和关系提取的基准测试Biorelex
Benchmarking BioRelEx for Entity Tagging and Relation Extraction
论文作者
论文摘要
在不同的生物实体之间提取关系和相互作用仍然是一个极具挑战性的问题,但没有在其他通用域中提取的那么多。除了缺乏带注释的数据外,低基准测试仍然是进步缓慢的主要原因。为了填补这一空白,我们将多个现有的实体和关系提取模型比较了最近引入的公共数据集,句子的Biorelex,该句子的生物学实体和关系注释。我们的直接基准测试表明,基于跨度的多任务架构(例如Dygie)在实体标记和关系提取的绝对改进中分别在先前的前部中分别提高了绝对改进,并且在相关领域的性能上促进了预先培训的嵌入式嵌入式信息。
Extracting relationships and interactions between different biological entities is still an extremely challenging problem but has not received much attention as much as extraction in other generic domains. In addition to the lack of annotated data, low benchmarking is still a major reason for slow progress. In order to fill this gap, we compare multiple existing entity and relation extraction models over a recently introduced public dataset, BioRelEx of sentences annotated with biological entities and relations. Our straightforward benchmarking shows that span-based multi-task architectures like DYGIE show 4.9% and 6% absolute improvements in entity tagging and relation extraction respectively over the previous state-of-art and that incorporating domain-specific information like embeddings pre-trained over related domains boosts performance.
