Zobrazeno 1 - 10
of 1 499
pro vyhledávání: '"Baifeng An"'
Autor:
Niu, Dantong, Sharma, Yuvan, Biamby, Giscard, Quenum, Jerome, Bai, Yutong, Shi, Baifeng, Darrell, Trevor, Herzig, Roei
In recent years, instruction-tuned Large Multimodal Models (LMMs) have been successful at several tasks, including image captioning and visual question answering; yet leveraging these models remains an open question for robotics. Prior LMMs for robot
Externí odkaz:
http://arxiv.org/abs/2406.11815
Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations. In this work, we discuss the point beyond which larger vision models are not necessary. First, we demonstrate the power of Scaling on
Externí odkaz:
http://arxiv.org/abs/2403.13043
Autor:
Radosavovic, Ilija, Zhang, Bike, Shi, Baifeng, Rajasegaran, Jathushan, Kamat, Sarthak, Darrell, Trevor, Sreenath, Koushil, Malik, Jitendra
We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal
Externí odkaz:
http://arxiv.org/abs/2402.19469
Autor:
Fu, Letian, Lian, Long, Wang, Renhao, Shi, Baifeng, Wang, Xudong, Yala, Adam, Darrell, Trevor, Efros, Alexei A., Goldberg, Ken
In this work, we re-examine inter-patch dependencies in the decoding mechanism of masked autoencoders (MAE). We decompose this decoding mechanism for masked patch reconstruction in MAE into self-attention and cross-attention. Our investigations sugge
Externí odkaz:
http://arxiv.org/abs/2401.14391
Visual Programming (VP) has emerged as a powerful framework for Visual Question Answering (VQA). By generating and executing bespoke code for each question, these methods demonstrate impressive compositional and reasoning capabilities, especially in
Externí odkaz:
http://arxiv.org/abs/2312.02249
Text-conditioned diffusion models have emerged as a promising tool for neural video generation. However, current models still struggle with intricate spatiotemporal prompts and often generate restricted or incorrect motion. To address these limitatio
Externí odkaz:
http://arxiv.org/abs/2309.17444
The Lombard effect refers to individuals' unconscious modulation of vocal effort in response to variations in the ambient noise levels, intending to enhance speech intelligibility. The impact of different decibel levels and types of background noise
Externí odkaz:
http://arxiv.org/abs/2309.07419
This study investigates the Lombard effect, where individuals adapt their speech in noisy environments. We introduce an enhanced Mandarin Lombard grid (EMALG) corpus with meaningful sentences , enhancing the Mandarin Lombard grid (MALG) corpus. EMALG
Externí odkaz:
http://arxiv.org/abs/2309.06858
Autor:
Hongyu Zhou, Licheng Tan, Baifeng Zhang, Dora Lai Wan Kwong, Ching Ngar Wong, Yu Zhang, Beibei Ru, Yingchen Lyu, Kin To Hugo Siu, Jie Luo, Yuma Yang, Qin Liu, Yixin Chen, Weiguang Zhang, Chaohui He, Peng Jiang, Yanru Qin, Beilei Liu, Xin-Yuan Guan
Publikováno v:
Nature Communications, Vol 15, Iss 1, Pp 1-19 (2024)
Abstract Emerging evidence suggests that cancer cells may disseminate early, prior to the formation of traditional macro-metastases. However, the mechanisms underlying the seeding and transition of early disseminated cancer cells (DCCs) into metastat
Externí odkaz:
https://doaj.org/article/141f9175d6024219ae7a8318b41af7f6
Publikováno v:
BMC Musculoskeletal Disorders, Vol 25, Iss 1, Pp 1-10 (2024)
Abstract Background In patients with ossification of the posterior longitudinal ligament of the cervical spine (OPLL), high spinal cord signal (HCS) is frequently observed in the spinal cord of the corresponding segment. However, studies on the diffe
Externí odkaz:
https://doaj.org/article/b582d296f1674586a1266e96c20f1b53