Výsledky vyhledávání

Report

LLaVA-SLT: Visual Language Tuning for Sign Language Translation

Autor: Liang, Han, Huang, Chengyu, Xu, Yuecheng, Tang, Cheng, Ye, Weicai, Zhang, Juze, Chen, Xin, Yu, Jingyi, Xu, Lan

In the realm of Sign Language Translation (SLT), reliance on costly gloss-annotated datasets has posed a significant barrier. Recent advancements in gloss-free SLT methods have shown promise, yet they often largely lag behind gloss-based approaches i

Externí odkaz: http://arxiv.org/abs/2412.16524

Zobrazit plný text záznamu

Report

CADSpotting: Robust Panoptic Symbol Spotting on Large-Scale CAD Drawings

Autor: Mu, Jiazuo, Yang, Fuyi, Zhang, Yanshun, Zhang, Junxiong, Luo, Yongjian, Xu, Lan, Shi, Yujiao, Yu, Jingyi, Zhang, Yingliang

We introduce CADSpotting, an efficient method for panoptic symbol spotting in large-scale architectural CAD drawings. Existing approaches struggle with the diversity of symbols, scale variations, and overlapping elements in CAD designs. CADSpotting o

Externí odkaz: http://arxiv.org/abs/2412.07377

Zobrazit plný text záznamu

Report

Unsupervised Multi-Parameter Inverse Solving for Reducing Ring Artifacts in 3D X-Ray CBCT

Autor: Wu, Qing, Wei, Hongjiang, Yu, Jingyi, Zhang, Yuyao

Ring artifacts are prevalent in 3D cone-beam computed tomography (CBCT) due to non-ideal responses of X-ray detectors, severely degrading imaging quality and reliability. Current state-of-the-art (SOTA) ring artifact reduction (RAR) algorithms rely o

Externí odkaz: http://arxiv.org/abs/2412.05853

Zobrazit plný text záznamu

Report

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Autor: Wu, Shijie, Zhu, Yihang, Huang, Yunao, Zhu, Kaizhen, Gu, Jiayuan, Yu, Jingyi, Shi, Ye, Wang, Jingya

Diffusion-based policies have shown impressive performance in robotic manipulation tasks while struggling with out-of-domain distributions. Recent efforts attempted to enhance generalization by improving the visual feature encoding for diffusion poli

Externí odkaz: http://arxiv.org/abs/2412.03142

Zobrazit plný text záznamu

Report

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

Autor: Yu, Chunlin, Wang, Hanqing, Shi, Ye, Luo, Haoyang, Yang, Sibei, Yu, Jingyi, Wang, Jingya

3D affordance segmentation aims to link human instructions to touchable regions of 3D objects for embodied manipulations. Existing efforts typically adhere to single-object, single-affordance paradigms, where each affordance type or explicit instruct

Externí odkaz: http://arxiv.org/abs/2412.01550

Zobrazit plný text záznamu

Report

NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Autor: Pan, Bikang, Li, Qun, Tang, Xiaoying, Huang, Wei, Fang, Zhen, Liu, Feng, Wang, Jingya, Yu, Jingyi, Shi, Ye

The emergence of vision-language foundation models, such as CLIP, has revolutionized image-text representation, enabling a broad range of applications via prompt learning. Despite its promise, real-world datasets often contain noisy labels that can d

Externí odkaz: http://arxiv.org/abs/2412.01256

Zobrazit plný text záznamu

Report

AerialGo: Walking-through City View Generation from Aerial Perspectives

Autor: Zhao, Fuqiang, Guo, Yijing, Yang, Siyuan, Chen, Xi, Wang, Luo, Xu, Lan, Zhang, Yingliang, Shi, Yujiao, Yu, Jingyi

High-quality 3D urban reconstruction is essential for applications in urban planning, navigation, and AR/VR. However, capturing detailed ground-level data across cities is both labor-intensive and raises significant privacy concerns related to sensit

Externí odkaz: http://arxiv.org/abs/2412.00157

Zobrazit plný text záznamu

Report

SMGDiff: Soccer Motion Generation using diffusion probabilistic models

Autor: Yang, Hongdi, Li, Chengyang, Wu, Zhenxuan, Li, Gaozheng, Wang, Jingya, Yu, Jingyi, Su, Zhuo, Xu, Lan

Soccer is a globally renowned sport with significant applications in video games and VR/AR. However, generating realistic soccer motions remains challenging due to the intricate interactions between the human player and the ball. In this paper, we in

Externí odkaz: http://arxiv.org/abs/2411.16216

Zobrazit plný text záznamu

Report

Label-Free Intraoperative Mean-Transition-Time Image Generation Using Statistical Gating and Deep Learning

Autor: Shi, Yan, Zhao, Denghui, Yu, Jingyi, Ni, Wei, Li, Pengcheng, Gu, Yun, Miao, Peng, Tong, Shanbao

It is of paramount importance to visualize blood dynamics intraoperatively, as this enables the accurate diagnosis of intraoperative conditions and facilitates informed surgical decision-making. Indocyanine green (ICG) fluorescence imaging represents

Externí odkaz: http://arxiv.org/abs/2411.16039

Zobrazit plný text záznamu

Report

Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization

Autor: Liu, Weihang, Zheng, Xue Xian, Yu, Jingyi, Lou, Xin

The recent popular radiance field models, exemplified by Neural Radiance Fields (NeRF), Instant-NGP and 3D Gaussian Splatting, are designed to represent 3D content by that training models for each individual scene. This unique characteristic of scene

Externí odkaz: http://arxiv.org/abs/2410.19483

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání