Towards Functionally Similar Corpus Resources for Translation
Autor: | Serge Sharoff, Maria Kunilovskaya |
---|---|
Rok vydání: | 2019 |
Předmět: |
060201 languages & linguistics
Computer science business.industry 05 social sciences 02 engineering and technology Translation (geometry) computer.software_genre Set (abstract data type) Recurrent neural network British National Corpus 0602 languages and literature ComputingMethodologies_DOCUMENTANDTEXTPROCESSING 0202 electrical engineering electronic engineering information engineering Translation studies Feature (machine learning) 020201 artificial intelligence & image processing Artificial intelligence Representation (mathematics) business computer Natural language processing Contrastive analysis |
Zdroj: | RANLP |
DOI: | 10.26615/978-954-452-056-4_069 |
Popis: | The paper describes a computational approach to produce functionally comparable monolingual corpus resources for translation studies and contrastive analysis. We exploit a text-external approach, based on a set of Functional Text Dimensions to model text functions, so that each text can be represented as a vector in a multidimensional space of text functions. These vectors can be used to find reasonably homogeneous subsets of functionally similar texts across different corpora. Our models for predicting text functions are based on recurrent neural networks and traditional feature-based machine learning approaches. In addition to using the categories of the British National Corpus as our test case, we investigated the functional comparability of the English parts from the two parallel corpora: CroCo (English-German) and RusLTC (English-Russian) and applied our models to define functionally similar clusters in them. Our results show that the Functional Text Dimensions provide a useful description for text categories, while allowing a more flexible representation for texts with hybrid functions. |
Databáze: | OpenAIRE |
Externí odkaz: |