Zobrazeno 1 - 10
of 24 997
pro vyhledávání: '"Mahajan, P"'
Autor:
Kim, Joongwon, Goyal, Anirudh, Zhang, Aston, Xiong, Bo, Hou, Rui, Kambadur, Melanie, Mahajan, Dhruv, Hajishirzi, Hannaneh, Tan, Liang
Preference learning is a widely adopted post-training technique that aligns large language models (LLMs) to human preferences and improves specific downstream task capabilities. In this work we systematically investigate how specific attributes of pr
Externí odkaz:
http://arxiv.org/abs/2412.15282
Autor:
Yang, Chenxi, Saxena, Divyanshu, Dwivedula, Rohit, Mahajan, Kshiteej, Chaudhuri, Swarat, Akella, Aditya
Learning-based congestion controllers offer better adaptability compared to traditional heuristic algorithms. However, the inherent unreliability of learning techniques can cause learning-based controllers to behave poorly, creating a need for formal
Externí odkaz:
http://arxiv.org/abs/2412.10915
Natural Language Inference (NLI) tasks require identifying the relationship between sentence pairs, typically classified as entailment, contradiction, or neutrality. While the current state-of-the-art (SOTA) model, Entailment Few-Shot Learning (EFL),
Externí odkaz:
http://arxiv.org/abs/2412.09263
It is shown that in the spacetime dominated by a cosmological constant, in the far region of a Schwarzschild-de Sitter black hole, a seed magnetic field can be generated in an ambient plasma (in a state of no magnetic field) by a general-relativistic
Externí odkaz:
http://arxiv.org/abs/2412.07516
In India, the majority of farmers are classified as small or marginal, making their livelihoods particularly vulnerable to economic losses due to market saturation and climate risks. Effective crop planning can significantly impact their expected inc
Externí odkaz:
http://arxiv.org/abs/2412.02057
In this paper, we investigate the concentration properties of cumulative rewards in Markov Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We introduce a unified approach to characterize reward concentration in MDP
Externí odkaz:
http://arxiv.org/abs/2411.18551
Autor:
Yu, Yue, Chen, Zhengxing, Zhang, Aston, Tan, Liang, Zhu, Chenguang, Pang, Richard Yuanzhe, Qian, Yundi, Wang, Xuewei, Gururangan, Suchin, Zhang, Chao, Kambadur, Melanie, Mahajan, Dhruv, Hou, Rui
Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF). However, current reward models mainly produce scalar scores and struggle to incorporate crit
Externí odkaz:
http://arxiv.org/abs/2411.16646
Autor:
Goyal, Sahil, Mahajan, Abhinav, Mishra, Swasti, Udhayanan, Prateksha, Shukla, Tripti, Joseph, K J, Srinivasan, Balaji Vasan
Graphic designs are an effective medium for visual communication. They range from greeting cards to corporate flyers and beyond. Off-late, machine learning techniques are able to generate such designs, which accelerates the rate of content production
Externí odkaz:
http://arxiv.org/abs/2411.14959
We have come up with a research that hopes to provide a bridge between the users of American Sign Language and the users of spoken language and Indian Sign Language (ISL). The research enabled us to create a novel framework that we have developed for
Externí odkaz:
http://arxiv.org/abs/2411.12685
Dimerization and subsequent aggregation of polymers and biopolymers often occur under nonequilibrium conditions. When the initial state of the polymer is not collapsed or the final folded native state, the dynamics of dimerization can follow a course
Externí odkaz:
http://arxiv.org/abs/2411.11811