Zobrazeno 1 - 10
of 146
pro vyhledávání: '"PAPOTTI, PAOLO"'
In this work we study how diffusion-based generative models produce high-dimensional data, such as an image, by implicitly relying on a manifestation of a low-dimensional set of latent abstractions, that guide the generative process. We present a nov
Externí odkaz:
http://arxiv.org/abs/2410.03368
Autor:
Corallo, Giulio, Papotti, Paolo
Recent large language model applications, such as Retrieval-Augmented Generation and chatbots, have led to an increased need to process longer input contexts. However, this requirement is hampered by inherent limitations. Architecturally, models are
Externí odkaz:
http://arxiv.org/abs/2408.00167
We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the three main steps: retrieving joinable tables, merging information, and predic
Externí odkaz:
http://arxiv.org/abs/2402.06282
Two-sample testing decides whether two datasets are generated from the same distribution. This paper studies variable selection for two-sample testing, the task being to identify the variables (or dimensions) responsible for the discrepancies between
Externí odkaz:
http://arxiv.org/abs/2311.01537
Publikováno v:
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23) (2023) 11--24
The detection of fake news has received increasing attention over the past few years, but there are more subtle ways of deceiving one's audience. In addition to the content of news stories, their presentation can also be made misleading or biased. In
Externí odkaz:
http://arxiv.org/abs/2305.15790
In many use-cases, information is stored in text but not available in structured data. However, extracting data from natural language text to precisely fit a schema, and thus enable querying, is a challenging task. With the rise of pre-trained Large
Externí odkaz:
http://arxiv.org/abs/2304.00472
Autor:
Garcia-Pueyo, Lluís, Tsaparas, Panayiotis, Bhaskar, Anand, Kumar, Prathyusha Senthil, van Zwol, Roelof, Sellis, Timos, McCosker, Anthony, Papotti, Paolo
This is the proposal for the third edition of the Workshop on Integrity in Social Networks and Media, Integrity 2022, following the success of the first two Workshops held in conjunction with the 13th & 14th ACM Conference on Web Search and Data Mini
Externí odkaz:
http://arxiv.org/abs/2209.11867
Publikováno v:
Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM 2022)
Fact-checking is one of the effective solutions in fighting online misinformation. However, traditional fact-checking is a process requiring scarce expert human resources, and thus does not scale well on social media because of the continuous flow of
Externí odkaz:
http://arxiv.org/abs/2208.09214
Entity resolution is a widely studied problem with several proposals to match records across relations. Matching textual content is a widespread task in many applications, such as question answering and search. While recent methods achieve promising
Externí odkaz:
http://arxiv.org/abs/2112.08776
Publikováno v:
EMNLP-2021
While pre-trained language models (PLMs) are the go-to solution to tackle many natural language processing problems, they are still very limited in their ability to capture and to use common-sense knowledge. In fact, even if information is available
Externí odkaz:
http://arxiv.org/abs/2109.13006