The First Parallel Multilingual Corpus of Persian: Toward a Persian BLARK

Autor: Qasemizadeh, Behrang, Rahimi, Saeed, Bakhtiari, Behrooz Mahmoodi
Rok vydání: 2014
Předmět:
Druh dokumentu: Working Paper
Popis: In this article, we have introduced the first parallel corpus of Persian with more than 10 other European languages. This article describes primary steps toward preparing a Basic Language Resources Kit (BLARK) for Persian. Up to now, we have proposed morphosyntactic specification of Persian based on EAGLE/MULTEXT guidelines and specific resources of MULTEXT-East. The article introduces Persian Language, with emphasis on its orthography and morphosyntactic features, then a new Part-of-Speech categorization and orthography for Persian in digital environments is proposed. Finally, the corpus and related statistic will be analyzed.
Databáze: arXiv