Agent data merging

Autor: Kristina Ionkina, Evgeniy Tretyakov, Ekaterina Lopatina, Evgeniy Antonov
Rok vydání: 2020
Předmět:
Zdroj: Procedia Computer Science. 169:473-478
ISSN: 1877-0509
Popis: The present article deals with data collection in a given field using the agent-based technologies from various information sources of the Internet with the aim to ob-tain reliable and up-to-date data. The agent-based approach is illustrated by the data collection on the nuclear power plants operating all over the world. Three open information sources have been selected for data extraction. The information sources concerned have been analyzed and the features of data provision structure identified. In the course of the present work the following tools for the develop-ment of the software agents have been described: the browser control for human behavior simulation, HTML markup analysis using the XPath query language and data extraction from PDF-documents using regular expressions. Above all, the article considers the software architecture and the database scheme. In the re-sult of the software operation, data regarding 789 nuclear power plants has been obtained.
Databáze: OpenAIRE