Incremental knowledge acquisition for natural language processing

Autor: Pham, Son Bao, Computer Science & Engineering, Faculty of Engineering, UNSW
Jazyk: angličtina
Rok vydání: 2006
Předmět:
Popis: Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph.
Databáze: Networked Digital Library of Theses & Dissertations