ZHONG L, WU J, LI Q, et al. A Comprehensive Survey on Automatic Knowledge Graph Construction[M]. arXiv, 2023. http://arxiv.org/abs/2302.05019. DOI:10.48550/arXiv.2302.05019.
这篇综述与其他类似工作的主要区别在于它提供了对自动知识图谱构建领域的全面和系统性的回顾。它不仅总结了超过 300 种方法,覆盖了从知识获取、知识精炼到知识演化的整个知识图谱构建过程,而且还根据数据环境、动机和架构对这些方法进行了细致的分类和比较。此外,这篇综述还提供了对可用资源(包括具有代表性的 KG 项目、数据集和构建工具)的简要介绍,帮助读者开发实用的知识图谱系统,并对领域的挑战和未来方向进行了深入讨论。
由于论文中各章节都列出了大量现有 KG 项目或工具,大家可以自己去原文中查看自己感兴趣部分的相关工具,本文不再列举
现有知识图构建调查的比较一些在各方向具有代表性的 KG 项目一些数据预处理、知识获取和知识精炼方面的现成 KG 工具
除了前面在“半结构化数据预处理”中提到的基于规则和基于统计的方法,深度学习方法也在近年来在 NER 任务中取得了显著的进展。这些方法通常将 NER 视为seq2seq(词序列到标签序列)的模型,使用循环神经网络(RNN)、长短期记忆网络(LSTM)或门控循环单元(GRU)来处理文本序列,并使用条件随机场(CRF)层来输出实体标签。此外,还有使用卷积神经网络(CNN)、图卷积(GCN)和注意力机制(attention mechanism)的方法,它们能够更好地捕捉局部和全局的上下文信息。
另外,如 BERT(Bidirectional Encoder Representations from Transformers)等预训练语言模型在 NER 任务中也表现出色。这些模型通过在大规模语料库上预训练,学习到丰富的语言表示,然后在特定的 NER 任务上进行微调,以适应特定领域的实体识别需求。
参考资料[1] SHIMAOKA S, STENETORP P, INUI K, et al. Neural Architectures for Fine-grained Entity Type Classification[C]//LAPATA M, BLUNSOM P, KOLLER A. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Valencia, Spain: Association for Computational Linguistics, 2017: 1271-1280. https://aclanthology.org/E17-111[2] XU P, BARBOSA D. Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss[C]//WALKER M, JI H, STENT A. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, 2018: 16-25. https://aclanthology.org/N18-1002. DOI:10.18653/v1/N18-1002.[3] ZHANG S, DUH K, VAN DURME B. Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds[C]//NISSIM M, BERANT J, LENCI A. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. New Orleans, Louisiana: Association for Computational Linguistics, 2018: 173-179. https://aclanthology.org/S18-2022. DOI:10.18653/v1/S18-2022.[4] CAI D, YU S, WEN J R, et al. Extracting Content Structure for Web Pages Based on Visual Representation[C]//ZHOU X, ORLOWSKA M E, ZHANG Y. Web Technologies and Applications. Berlin, Heidelberg: Springer, 2003: 406-417. DOI:10.1007/3-540-36901-5_42.[5] LIMAYE G, SARAWAGI S, CHAKRABARTI S. Annotating and searching web tables using entities, types and relationships[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 1338-1347. DOI:10.14778/1920841.1921005.[6] MULWAD V, FININ T, SYED Z, et al. Using linked data to interpret tables[C]//Proceedings of the First International Conference on Consuming Linked Data - Volume 665. Aachen, DEU: CEUR-WS.org, 2010: 109-120.[7] GUO Y, CHE W, LIU T, et al. A Graph-based Method for Entity Linking[C]//WANG H, YAROWSKY D. Proceedings of 5th International Joint Conference on Natural Language Processing. Chiang Mai, Thailand: Asian Federation of Natural Language Processing, 2011: 1010-1018. https://aclanthology.org/I11-1113.[8] BAGGA A, BALDWIN B. Entity-Based Cross-Document Coreferencing Using the Vector Space Model[C]//36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1. Montreal, Quebec, Canada: Association for Computational Linguistics, 1998: 79-85. https://aclanthology.org/P98-1012. DOI:10.3115/980845.980859.[9] LAŠEK I, VOJTÁŠ P. Various approaches to text representation for named entity disambiguation[C]//Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services. New York, NY, USA: Association for Computing Machinery, 2012: 256-262. https://doi.org/10.1145/2428736.2428776. DOI:10.1145/2428736.2428776.[10] FANG W, ZHANG J, WANG D, et al. Entity Disambiguation by Knowledge and Text Jointly Embedding[C]//RIEZLER S, GOLDBERG Y. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Berlin, Germany: Association for Computational Linguistics, 2016: 260-269. https://aclanthology.org/K16-1026. DOI:10.18653/v1/K16-1026.[11] LE P, TITOV I. Improving Entity Linking by Modeling Latent Relations between Mentions[C]//GUREVYCH I, MIYAO Y. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018: 1595-1604. https://aclanthology.org/P18-1148. DOI:10.18653/v1/P18-1148.