Výsledky vyhledávání

Report

Enhanced third-harmonic generation and degenerate four-wave mixing in an all-dielectric metasurfaces via Brillouin zone folding-induced bound states in the continuum

Autor: Qin, Meibao, Wu, Feng, Liu, Tingting, Zhang, Dandan, Xiao, Shuyuan

Bound states in the continuum (BICs) exhibit significant electric field confinement capabilities and have recently been employed to enhance nonlinear optics response at the nanoscale. In this study, we achieve substantial enhancement of third-harmoni

Externí odkaz: http://arxiv.org/abs/2411.12639

Zobrazit plný text záznamu

Report

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

Autor: Wang, Fuying, Wu, Feng, Tang, Yihan, Yu, Lequan

Integrating multimodal Electronic Health Records (EHR) data, such as numerical time series and free-text clinical reports, has great potential in predicting clinical outcomes. However, prior work has primarily focused on capturing temporal interactio

Externí odkaz: http://arxiv.org/abs/2411.00696

Zobrazit plný text záznamu

Report

MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

Autor: Liu, Haoyang, Wang, Jie, Zhang, Wanbo, Geng, Zijie, Kuang, Yufei, Li, Xijun, Li, Bin, Zhang, Yongdong, Wu, Feng

Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. In practice, improving the performance of MILP solvers often requires a large amount of high-quality data, which can be challengi

Externí odkaz: http://arxiv.org/abs/2410.22806

Zobrazit plný text záznamu

Report

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Autor: Xing, Long, Huang, Qidong, Dong, Xiaoyi, Lu, Jiajie, Zhang, Pan, Zang, Yuhang, Cao, Yuhang, He, Conghui, Wang, Jiaqi, Wu, Feng, Lin, Dahua

In large vision-language models (LVLMs), images serve as inputs that carry a wealth of information. As the idiom "A picture is worth a thousand words" implies, representing a single image in current LVLMs can require hundreds or even thousands of tok

Externí odkaz: http://arxiv.org/abs/2410.17247

Zobrazit plný text záznamu

Report

Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

Autor: Lv, Qitan, Wang, Jie, Chen, Hanzhu, Li, Bin, Zhang, Yongdong, Wu, Feng

Generation of plausible but incorrect factual information, often termed hallucination, has attracted significant research interest. Retrieval-augmented language model (RALM) -- which enhances models with up-to-date knowledge -- emerges as a promising

Externí odkaz: http://arxiv.org/abs/2410.15116

Zobrazit plný text záznamu

Report

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

Autor: Peng, Yansong, Li, Hebei, Wu, Peixi, Zhang, Yueyi, Sun, Xiaoyan, Wu, Feng

We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR)

Externí odkaz: http://arxiv.org/abs/2410.13842

Zobrazit plný text záznamu

Report

USTC-TD: A Test Dataset and Benchmark for Image and Video Coding in 2020s

Autor: Li, Zhuoyuan, Liao, Junqi, Tang, Chuanbo, Zhang, Haotian, Li, Yuqi, Bian, Yifan, Sheng, Xihua, Feng, Xinmin, Li, Yao, Gao, Changsheng, Li, Li, Liu, Dong, Wu, Feng

Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets are desirable for the justified evaluation of coding-related research, practical appl

Externí odkaz: http://arxiv.org/abs/2409.08481

Zobrazit plný text záznamu

Report

ClickAttention: Click Region Similarity Guided Interactive Segmentation

Autor: Xu, Long, Li, Shanghong, Chen, Yongquan, Chen, Junkang, Huang, Rui, Wu, Feng

Interactive segmentation algorithms based on click points have garnered significant attention from researchers in recent years. However, existing studies typically use sparse click maps as model inputs to segment specific target objects, which primar

Externí odkaz: http://arxiv.org/abs/2408.06021

Zobrazit plný text záznamu

Report

NVC-1B: A Large Neural Video Coding Model

Autor: Sheng, Xihua, Tang, Chuanbo, Li, Li, Liu, Dong, Wu, Feng

The emerging large models have achieved notable progress in the fields of natural language processing and computer vision. However, large models for neural video coding are still unexplored. In this paper, we try to explore how to build a large neura

Externí odkaz: http://arxiv.org/abs/2407.19402

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání