PostCAT - Posterior Constrained Alignment Toolkit

Autor: João Graça, Kuzman Ganchev, Ben Taskar
Rok vydání: 2009
Předmět:
Zdroj: The Prague Bulletin of Mathematical Linguistics. 91
ISSN: 1804-0462
0032-6585
Popis: In this paper we present a new open-source toolkit for statistical word alignments - Posterior Constrained Alignment Toolkit (PostCAT). e toolkit implements three well known word alignment algorithms (IBM M1, IBM M2, HMM) as well as six new models. In addition to the usual Viterbi decoding scheme, the toolkit provides posterior decoding with several flavors for tuning the threshold. e toolkit also provides an implementation of alignment symmetrization heuristics and a set of utilities for analyzing and pretty printing alignments. e new models have already been shown to improve intrinsic alignment metrics and also to lead to better translations when integrated into a state of the art machine translation system. e toolkit is developed in Java and available in source at its website ¹. We encourage other researchers to build on our work by modifying the toolkit and using it for their research.
Databáze: OpenAIRE