Agent data merging
Autor: | Kristina Ionkina, Evgeniy Tretyakov, Ekaterina Lopatina, Evgeniy Antonov |
---|---|
Rok vydání: | 2020 |
Předmět: |
Information retrieval
Data collection Markup language business.industry Computer science computer.internet_protocol 020206 networking & telecommunications 02 engineering and technology Query language Software Data extraction Software agent 0202 electrical engineering electronic engineering information engineering General Earth and Planetary Sciences 020201 artificial intelligence & image processing The Internet business Software architecture computer General Environmental Science XPath |
Zdroj: | Procedia Computer Science. 169:473-478 |
ISSN: | 1877-0509 |
Popis: | The present article deals with data collection in a given field using the agent-based technologies from various information sources of the Internet with the aim to ob-tain reliable and up-to-date data. The agent-based approach is illustrated by the data collection on the nuclear power plants operating all over the world. Three open information sources have been selected for data extraction. The information sources concerned have been analyzed and the features of data provision structure identified. In the course of the present work the following tools for the develop-ment of the software agents have been described: the browser control for human behavior simulation, HTML markup analysis using the XPath query language and data extraction from PDF-documents using regular expressions. Above all, the article considers the software architecture and the database scheme. In the re-sult of the software operation, data regarding 789 nuclear power plants has been obtained. |
Databáze: | OpenAIRE |
Externí odkaz: |