Zobrazeno 1 - 10
of 557
pro vyhledávání: '"WANG Haoxiang"'
Publikováno v:
康复学报, Vol 33, Pp 7-13 (2023)
ObjectiveTo evaluate hospitalized patients with diabetes mellitus with the International Classification of Functioning, Disability and Health Rehabilitation Set (ICF-RS) and to explore factors that affect patients' functions.MethodsA total of 419 pat
Externí odkaz:
https://doaj.org/article/70342a933aab41928b198d0cbe3528f8
Reward models (RM) capture the values and preferences of humans and play a central role in Reinforcement Learning with Human Feedback (RLHF) to align pretrained large language models (LLMs). Traditionally, training these models relies on extensive hu
Externí odkaz:
http://arxiv.org/abs/2409.06903
Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference data. Conve
Externí odkaz:
http://arxiv.org/abs/2406.12845
Autor:
Dong, Hanze, Xiong, Wei, Pang, Bo, Wang, Haoxiang, Zhao, Han, Zhou, Yingbo, Jiang, Nan, Sahoo, Doyen, Xiong, Caiming, Zhang, Tong
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literatu
Externí odkaz:
http://arxiv.org/abs/2405.07863
Autor:
Yu, Xiaohang, Yang, Zhengxian, Pan, Shi, Han, Yuqi, Wang, Haoxiang, Zhang, Jun, Yan, Shi, Lin, Borong, Yang, Lei, Yu, Tao, Fang, Lu
We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3
Externí odkaz:
http://arxiv.org/abs/2403.09973
Autor:
Wang, Haoxiang, Lin, Yong, Xiong, Wei, Yang, Rui, Diao, Shizhe, Qiu, Shuang, Zhao, Han, Zhang, Tong
Fine-grained control over large language models (LLMs) remains a significant challenge, hindering their adaptability to diverse user needs. While Reinforcement Learning from Human Feedback (RLHF) shows promise in aligning LLMs, its reliance on scalar
Externí odkaz:
http://arxiv.org/abs/2402.18571
Real-world applications of machine learning models often confront data distribution shifts, wherein discrepancies exist between the training and test data distributions. In the common multi-domain multi-class setup, as the number of classes and domai
Externí odkaz:
http://arxiv.org/abs/2402.02851
Domain generalization asks for models trained over a set of training environments to generalize well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) have been proposed for domain generalization.
Externí odkaz:
http://arxiv.org/abs/2311.00966
Autor:
Wang, Haoxiang, Vasu, Pavan Kumar Anasosalu, Faghri, Fartash, Vemulapalli, Raviteja, Farajtabar, Mehrdad, Mehta, Sachin, Rastegari, Mohammad, Tuzel, Oncel, Pouransari, Hadi
The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP exce
Externí odkaz:
http://arxiv.org/abs/2310.15308
Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is l
Externí odkaz:
http://arxiv.org/abs/2310.13852