Building a Knowledge Based Summarization System for Text Data Mining
Autor: | Ben Choi, Andrey Timofeyev |
---|---|
Přispěvatelé: | Computer Science [Louisiana], College of Engineering and Science [Louisiana], Louisiana Tech University-Louisiana Tech University, Andreas Holzinger, Peter Kieseberg, A Min Tjoa, Edgar Weippl, TC 5, TC 8, TC 12, WG 8.4, WG 8.9, WG 12.9 |
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
Artificial intelligence
Knowledge representation and reasoning business.industry Computer science [SHS.INFO]Humanities and Social Sciences/Library and information sciences Knowledge-based systems computer.software_genre Automatic summarization Knowledge acquisition Knowledge extraction Knowledge base Text summarization Domain knowledge [INFO]Computer Science [cs] Inference engine business computer Data mining Natural language processing |
Zdroj: | Lecture Notes in Computer Science 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE) 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Aug 2018, Hamburg, Germany. pp.118-133, ⟨10.1007/978-3-319-99740-7_8⟩ Lecture Notes in Computer Science ISBN: 9783319997391 CD-MAKE |
DOI: | 10.1007/978-3-319-99740-7_8⟩ |
Popis: | Part 1: MAKE-Main Track; International audience; This paper provides details on building a knowledge based automatic summarization system for mining text data. The knowledge based system mines text data on documents and webpages to create abstractive summaries by generalizing new concepts, deriving main topics, and creating new sentences. The knowledge based system makes use of the domain knowledge provided by Cyc development platform that consists of the world’s largest knowledge base and one of the most powerful inference engines. The system extracts syntactic structures and semantic features by employing natural language processing techniques and Cyc knowledge base and reasoning engine. The system creates a summary of the given documents in three stages: knowledge acquisition, knowledge discovery, and knowledge representation for human readers. The knowledge acquisition derives syntactic structure of each sentence in the documents and maps their words and their syntactic relationships into Cyc knowledge base. The knowledge discovery abstracts novel concepts and derives main topics of the documents by exploring the ontology of the mapped concepts and by clustering the concepts. The knowledge representation creates new English sentences to summarize the documents. This system has been implemented and integrated with Cyc knowledge based system. The implementation encodes a process consisting seven stages: syntactic analysis, mapping words to Cyc, concept propagation, concept weights and relations accumulation, topic derivation, subject identification, and new sentence generation. The implementation has been tested on various documents and webpages. The test performance data suggests that such a system could benefit from running on parallel and distributed computing platforms. The test results showed that the system is capable of creating new sentences that include abstracted concepts not explicitly mentioned in the original documents and that contain information synthesized from different parts of the documents to compose a summary. |
Databáze: | OpenAIRE |
Externí odkaz: |