Výsledky vyhledávání

Akademický článek

Approximating Probabilistic Models as Weighted Finite Automata

Autor: Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol

Publikováno v: Computational Linguistics, Vol 47, Iss 2, Pp 221-254 (2021)

AbstractWeighted finite automata (WFAs) are often used to represent probabilistic models, such as n-gram language models, because among other things, they are efficient for recognition tasks in time and space. The probabilistic source to be represent

Externí odkaz: https://doaj.org/article/8e316cf7fcdd4ed29b9d0947f464f413

Zobrazit plný text záznamu

Akademický článek

Phonotactic Complexity and Its Trade-offs

Autor: Tiago Pimentel, Brian Roark, Ryan Cotterell

Publikováno v: Transactions of the Association for Computational Linguistics, Vol 8, Pp 1-18 (2020)

AbstractWe present methods for calculating a measure of phonotactic complexity—bits per phoneme— that permits a straightforward cross-linguistic comparison. When given a word, represented as a sequence of phonemic segments such as symbols in the

Externí odkaz: https://doaj.org/article/947852ef831b483fa5eb29a65f0d4dce

Zobrazit plný text záznamu

Akademický článek

Neural Models of Text Normalization for Speech Applications

Autor: Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, Brian Roark

Publikováno v: Computational Linguistics, Vol 45, Iss 2, Pp 293-337 (2019)

Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech a

Externí odkaz: https://doaj.org/article/90dc08a28df744cfa0e164d9471b7751

Zobrazit plný text záznamu

Plný text ve formátu HTML

Elektronická kniha

Computational Approaches to Morphology and Syntax

Autor: Brian Roark, Richard Sproat

The book will appeal to scholars and advanced students of morphology, syntax, computational linguistics and natural language processing (NLP). It provides a critical and practical guide to computational techniques for handling morphological and synta

Zobrazit plný text záznamu

Approximating Probabilistic Models as Weighted Finite Automata

Autor: Brian Roark, Michael Riley, Vlad Schogol, Ananda Theertha Suresh

Publikováno v: Computational Linguistics. :1-34

Weighted finite automata (WFAs) are often used to represent probabilistic models, such as ngram language models, because among other things, they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a W

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4198d4fef00946049ff78090860ca299
https://doi.org/10.1162/coli_a_00401

Zobrazit plný text záznamu

Finding Concept-specific Biases in Form–Meaning Associations

Autor: Brian Roark, Damián E. Blasi, Ryan Cotterell, Tiago Pimentel, Søren Wichmann

Publikováno v: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
NAACL-HLT

This work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::87f2ae863ba8bd73c29b53ca164ee684
https://hdl.handle.net/20.500.11850/518985

Zobrazit plný text záznamu

Structured abbreviation expansion in context

Autor: Kyle Gorman, Christo Kirov, Brian Roark, Richard Sproat

Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages. We consider the task of reversing these abbreviations in context to recover normalized, expanded versions of abbreviated messages. The problem is

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::954d664dfcd5b184aba34b91fd572f81

Zobrazit plný text záznamu

Finite-state script normalization and processing utilities: The Nisaba Brahmic library

Autor: Lawrence Wolf-Sonkin, Alexander Gutkin, Brian Roark, Cibu Johny

Publikováno v: EACL (System Demonstrations)

This paper presents an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts. The library provides a flexible and extensible framework for supporting crucial operations on Brahmic scripts, such as NFC, visual

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::37da5a5d65f3fd39fe61dcbdb63659e6
https://doi.org/10.18653/v1/2021.eacl-demos.3

Zobrazit plný text záznamu

Language-agnostic Multilingual Modeling

Autor: Anjuli Kannan, Brian Roark, Bhuvana Ramabhadran, Jesse Emond, Arindrima Datta

Publikováno v: ICASSP

Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarc

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6a151204da3e0d461f9892fc4c305984
http://arxiv.org/abs/2004.09571

Zobrazit plný text záznamu

Phonotactic Complexity and Its Trade-offs

Autor: Ryan Cotterell, Tiago Pimentel, Brian Roark

Publikováno v: Transactions of the Association for Computational Linguistics, 8
Transactions of the Association for Computational Linguistics, Vol 8, Pp 1-18 (2020)

We present methods for calculating a measure of phonotactic complexity—bits per phoneme— that permits a straightforward cross-linguistic comparison. When given a word, represented as a sequence of phonemic segments such as symbols in the internat

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6f8b92f552ab34c7fbabbe6c0d63df38
https://hdl.handle.net/20.500.11850/462324

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání