Popis: |
Визначено змістовні ознаки і характеристики англомовного тексту на основі дослідження зв’язків між лемами та синсетами, що розпізнано лінгвістичними пакетами. Результати у вигляді списків ключових слів, елементів онтологій та змістовних кластерів понять отримано на прик-ладі «Address by President of the Russian Federation 2013/2014». Проведене дослідження було здійснено за допомогою пакетів DKPro Core та NLTK. By increasing the volume of electronic information and its availability via the Internet knowledge extraction from natural language texts is one of the most pressing research areas in computer linguistics. Using the linguistic processing of natural language packs can implement alternative methods of finding meaningful information from text that is analyzed. Most of the known methods that are based on statistical patterns and / or morphological and syntactic analysis of the text have a number of problems. An alternative way to solve the problem of knowledge extraction of text information can be approach to formalization imaginative methods ofanalysis and synthesis of natural language constructions. In this article, reviewed formal methods to extract knowledge from texts, including content defining features and characteristics of the English text based on relations between lemmas and synsets which recognized linguistic packets. Result is lists of keywords, ontology content elements and meaningful clusters of concepts obtained by the example of «Address by President of the Russian Federation 2013/2014». Research was conducted using popular free resource DKPro Core and NLTK. Based on the platform DKPro Core was developed software to determine the keywords researched text, building text ontologies made on platform NLTK. The results allowed getting a joint formal parameters and characteristics for the two official texts, as well as those that indicate a change in emphasis of the information. The approach and elaborated tools can be useful for experts from the linguistic expertise and a wide range of researchers in computer linguistics. |