Zobrazeno 1 - 10
of 107 761
pro vyhledávání: '"real-world scenarios"'
The advent of Multimodal Large Language Models, leveraging the power of Large Language Models, has recently demonstrated superior multimodal understanding and reasoning abilities, heralding a new era for artificial general intelligence. However, achi
Externí odkaz:
http://arxiv.org/abs/2412.04447
Autor:
Wu, Junchao, Zhan, Runzhe, Wong, Derek F., Yang, Shu, Yang, Xinyi, Yuan, Yulin, Chao, Lidia S.
Detecting text generated by large language models (LLMs) is of great recent interest. With zero-shot methods like DetectGPT, detection capabilities have reached impressive levels. However, the reliability of existing detectors in real-world applicati
Externí odkaz:
http://arxiv.org/abs/2410.23746
Autor:
Zhou, Ruiwen, Hua, Wenyue, Pan, Liangming, Cheng, Sitao, Wu, Xiaobao, Yu, En, Wang, William Yang
This paper introduces RuleArena, a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning. Covering three practical domains -- airline baggage fees, NBA transa
Externí odkaz:
http://arxiv.org/abs/2412.08972
Recent Text-to-SQL methods leverage large language models (LLMs) by incorporating feedback from the database management system. While these methods effectively address execution errors in SQL queries, they struggle with database mismatches -- errors
Externí odkaz:
http://arxiv.org/abs/2408.16991
Autor:
Zhang, Yi-Fan, Zhang, Huanyu, Tian, Haochen, Fu, Chaoyou, Zhang, Shuangqing, Wu, Junfei, Li, Feng, Wang, Kun, Wen, Qingsong, Zhang, Zhang, Wang, Liang, Jin, Rong, Tan, Tieniu
Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure
Externí odkaz:
http://arxiv.org/abs/2408.13257
Achieving robust speech separation for overlapping speakers in various acoustic environments with noise and reverberation remains an open challenge. Although existing datasets are available to train separators for specific scenarios, they do not effe
Externí odkaz:
http://arxiv.org/abs/2408.16126
Autor:
Aykac, Deniz, Brogan, Joel, Barber, Nell, Shivers, Ryan, Zhang, Bob, Sacca, Dallas, Tipton, Ryan, Jager, Gavin, Garret, Austin, Love, Matthew, Goddard, Jim, Cornett III, David, Bolme, David S.
The considerable body of data available for evaluating biometric recognition systems in Research and Development (R\&D) environments has contributed to the increasingly common problem of target performance mismatch. Biometric algorithms are frequentl
Externí odkaz:
http://arxiv.org/abs/2409.01540
From SAE Level 3 of automation onwards, drivers are allowed to engage in activities that are not directly related to driving during their travel. However, in level 3, a misunderstanding of the capabilities of the system might lead drivers to engage i
Externí odkaz:
http://arxiv.org/abs/2408.09833
Autor:
Hasan, S M Rakib, Dhakal, Aakar
In the era of the internet and smart devices, the detection of malware has become crucial for system security. Malware authors increasingly employ obfuscation techniques to evade advanced security solutions, making it challenging to detect and elimin
Externí odkaz:
http://arxiv.org/abs/2404.02372
Autor:
Ochieng, Millicent, Gumma, Varun, Sitaram, Sunayana, Wang, Jindong, Chaudhary, Vishrav, Ronen, Keshet, Bali, Kalika, O'Neill, Jacki
The deployment of Large Language Models (LLMs) in real-world applications presents both opportunities and challenges, particularly in multilingual and code-mixed communication settings. This research evaluates the performance of seven leading LLMs in
Externí odkaz:
http://arxiv.org/abs/2406.00343