Syntax description synthesis using gradient boosted trees
Autor: | Arseny Astashkin, Kirill Chuvilin |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: | |
Zdroj: | Proceedings of the XXth Conference of Open Innovations Association FRUCT, Vol 776, Iss 20, Pp 32-39 (2017) |
Druh dokumentu: | article |
ISSN: | 2305-7254 2343-0737 |
DOI: | 10.23919/FRUCT.2017.8071289 |
Popis: | The article considers partially formalized text documents. For such documents, it is not possible to construct a formal grammar. Therefore, an external syntax description is used to build the syntax tree. The problem is the high labor intensity and the high professional requirements for manual preparation of such descriptions. It is proposed to use machine learning methods to automate this process. The training set is composed using the documents with known syntax description. Each document is represented as a syntax tree using the TEXnous parser. Each node of these trees represents a syntax element, and the set of nodes forms the training set. A way of a single syntax element description is proposed so that a formal description of the syntax elements constitutes the space of classes. In the article, this space is limited to the set of parser modes used during the documents analysis. A set of scientific articles is used for the experiments. XGBoost implementation of gradient boosted trees is chosen for result classification problem. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |