Advances in discriminative dependency parsing

Autor: Koo, Terry (Terry Y.)
Jazyk: angličtina
Rok vydání: 2010
Předmět:
Druh dokumentu: Diplomová práce
Popis: Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 167-176).
Achieving a greater understanding of natural language syntax and parsing is a critical step in producing useful natural language processing systems. In this thesis, we focus on the formalism of dependency grammar as it allows one to model important head modifier relationships with a minimum of extraneous structure. Recent research in dependency parsing has highlighted the discriminative structured prediction framework (McDonald et al., 2005a; Carreras, 2007; Suzuki et al., 2009), which is characterized by two advantages: first, the availability of powerful discriminative learning algorithms like log-linear and max-margin models (Lafferty et al., 2001; Taskar et al., 2003), and second, the ability to use arbitrarily-defined feature representations. This thesis explores three advances in the field of discriminative dependency parsing. First, we show that the classic Matrix-Tree Theorem (Kirchhoff, 1847; Tutte, 1984) can be applied to the problem of non-projective dependency parsing, enabling both log-linear and max-margin parameter estimation in this setting. Second, we present novel third-order dependency parsing algorithms that extend the amount of context available to discriminative parsers while retaining computational complexity equivalent to existing second-order parsers. Finally, we describe a simple but effective method for augmenting the features of a dependency parser with information derived from standard clustering algorithms; our semi-supervised approach is able to deliver consistent benefits regardless of the amount of available training data.
by Terry Koo.
Ph.D.
Databáze: Networked Digital Library of Theses & Dissertations