On the enrichment of time series with textual data for forecasting agricultural commodity prices

Autor: Ivan José Reis Filho, Ricardo Marcondes Marcacini, Solange Oliveira Rezende
Rok vydání: 2022
Předmět:
Zdroj: Repositório Institucional da USP (Biblioteca Digital da Produção Intelectual)
Universidade de São Paulo (USP)
instacron:USP
Popis: Forecasting models in the financial market generally use quantitative time-series data. However, external factors can influence data in time-series, such as weather events, economic crises, and the foreign exchange market. This information is not explicit in the time-series and can influence the prediction of the variable values. Textual data can be a source of knowledge about external factors and is potentially helpful for time-series forecasting models. Some studies have presented text mining techniques to combine textual and time-series data. However, the existing representations have limitations, such as the curse of dimensionality and sparse data. This work investigates the finite use of domain-specific terms to investigate these problems by representing textual data with low dimensional space. We consider thirty-three keywords that are potentially important in the domain to enrich time-series using text mining techniques. Four regression models were applied to the representation proposed to predict the future daily price of corn and soybeans. The experimental setup considers a real market scenario, in which the daily sliding window strategy and step-forward forecast were used. The representation proposed has better accuracy in some forecasting scenarios. The results indicate that text data are a promising alternative for enriching time-series representations and reducing uncertainty forecasting models.•We show an approach to enriching time-series using domain-specific terms;•Representation proposed combines quantitative data with qualitative market factors;•Regression Models to learn a forecasting function from enriched time-series.
Databáze: OpenAIRE