Zobrazeno 1 - 10
of 325
pro vyhledávání: '"Lotufo, Roberto A."'
Autor:
Fernandes, Leandro Carísio, Dobins, Guilherme Zeferino Rodrigues, Lotufo, Roberto, Pereira, Jayr Alencar
This paper introduces PublicHearingBR, a Brazilian Portuguese dataset designed for summarizing long documents. The dataset consists of transcripts of public hearings held by the Brazilian Chamber of Deputies, paired with news articles and structured
Externí odkaz:
http://arxiv.org/abs/2410.07495
Language models are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens. However, they often fail on tasks that require repetitive use of simple rules, even on sequences that are much sh
Externí odkaz:
http://arxiv.org/abs/2410.06396
Autor:
Fernandes, Leandro Carísio, Guedes, Gustavo Bartz, Laitz, Thiago Soares, Almeida, Thales Sales, Nogueira, Rodrigo, Lotufo, Roberto, Pereira, Jayr
Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a
Externí odkaz:
http://arxiv.org/abs/2408.16444
This work presents Retail-GPT, an open-source RAG-based chatbot designed to enhance user engagement in retail e-commerce by guiding users through product recommendations and assisting with cart operations. The system is cross-platform and adaptable t
Externí odkaz:
http://arxiv.org/abs/2408.08925
Evaluating the quality of text generated by large language models (LLMs) remains a significant challenge. Traditional metrics often fail to align well with human judgments, particularly in tasks requiring creativity and nuance. In this paper, we prop
Externí odkaz:
http://arxiv.org/abs/2407.14467
Despite advancements in Natural Language Processing (NLP) and the growing availability of pretrained models, the English language remains the primary focus of model development. Continued pretraining on language-specific corpora provides a practical
Externí odkaz:
http://arxiv.org/abs/2406.10806
Multilingual pretraining has been a successful solution to the challenges posed by the lack of resources for languages. These models can transfer knowledge to target languages with minimal or no examples. Recent research suggests that monolingual mod
Externí odkaz:
http://arxiv.org/abs/2404.08191
Autor:
Bueno, Mirelle, de Oliveira, Eduardo Seiti, Nogueira, Rodrigo, Lotufo, Roberto A., Pereira, Jayr Alencar
Despite Portuguese being one of the most spoken languages in the world, there is a lack of high-quality information retrieval datasets in that language. We present Quati, a dataset specifically designed for the Brazilian Portuguese language. It compr
Externí odkaz:
http://arxiv.org/abs/2404.06976
Language models are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens. However, they often fail on tasks that require repetitive use of simple rules, even on sequences that are much sh
Externí odkaz:
http://arxiv.org/abs/2402.07859
ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels. The method addresses the challenge of limited labeled examples, leading to improvements in the
Externí odkaz:
http://arxiv.org/abs/2402.06334