Výsledky vyhledávání - "Zhu Zhaoqing"

Report

Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding

Autor: Shao, Zirui, Luo, Chuwei, Zhu, Zhaoqing, Xing, Hangdi, Yu, Zhi, Zheng, Qi, Bu, Jiajun

Multimodal large language models (MLLMs) have shown impressive capabilities in document understanding, a rapidly growing research area with significant industrial demand in recent years. As a multimodal task, document understanding requires models to

Externí odkaz: http://arxiv.org/abs/2411.07722

Zobrazit plný text záznamu

Report

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

Autor: Shen, Yufan, Luo, Chuwei, Zhu, Zhaoqing, Chen, Yang, Zheng, Qi, Yu, Zhi, Bu, Jiajun, Yao, Cong

Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effective evalu

Externí odkaz: http://arxiv.org/abs/2407.12358

Zobrazit plný text záznamu

Report

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Autor: Luo, Chuwei, Shen, Yufan, Zhu, Zhaoqing, Zheng, Qi, Yu, Zhi, Yao, Cong

Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored

Externí odkaz: http://arxiv.org/abs/2404.05225

Zobrazit plný text záznamu

Report

CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition

Autor: Li, Hanting, Niu, Hongjing, Zhu, Zhaoqing, Zhao, Feng

Facial expression recognition (FER) is an essential task for understanding human behaviors. As one of the most informative behaviors of humans, facial expressions are often compound and variable, which is manifested by the fact that different people

Externí odkaz: http://arxiv.org/abs/2303.00193

Zobrazit plný text záznamu

Report

Intensity-Aware Loss for Dynamic Facial Expression Recognition in the Wild

Autor: Li, Hanting, Niu, Hongjing, Zhu, Zhaoqing, Zhao, Feng

Compared with the image-based static facial expression recognition (SFER) task, the dynamic facial expression recognition (DFER) task based on video sequences is closer to the natural expression recognition scene. However, DFER is often more challeng

Externí odkaz: http://arxiv.org/abs/2208.10335

Zobrazit plný text záznamu

Report

NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition

Autor: Li, Hanting, Sui, Mingzhe, Zhu, Zhaoqing, zhao, Feng

Dynamic facial expression recognition (DFER) in the wild is an extremely challenging task, due to a large number of noisy frames in the video sequences. Previous works focus on extracting more discriminative features, but ignore distinguishing the ke

Externí odkaz: http://arxiv.org/abs/2206.04975

Zobrazit plný text záznamu

Report

AFNet-M: Adaptive Fusion Network with Masks for 2D+3D Facial Expression Recognition

Autor: Sui, Mingzhe, Li, Hanting, Zhu, Zhaoqing, Zhao, Feng

2D+3D facial expression recognition (FER) can effectively cope with illumination changes and pose variations by simultaneously merging 2D texture and more robust 3D depth information. Most deep learning-based approaches employ the simple fusion strat

Externí odkaz: http://arxiv.org/abs/2205.11785

Zobrazit plný text záznamu

Report

MMNet: Muscle motion-guided network for micro-expression recognition

Autor: Li, Hanting, Sui, Mingzhe, Zhu, Zhaoqing, Zhao, Feng

Publikováno v: Proc. 31st Int'l Joint Conf. Artificial Intelligence (IJCAI), 2022

Facial micro-expressions (MEs) are involuntary facial motions revealing peoples real feelings and play an important role in the early intervention of mental illness, the national security, and many human-computer interaction systems. However, existin

Externí odkaz: http://arxiv.org/abs/2201.05297

Zobrazit plný text záznamu

Report

MFEViT: A Robust Lightweight Transformer-based Network for Multimodal 2D+3D Facial Expression Recognition

Autor: Li, Hanting, Sui, Mingzhe, Zhu, Zhaoqing, Zhao, Feng

Vision transformer (ViT) has been widely applied in many areas due to its self-attention mechanism that help obtain the global receptive field since the first layer. It even achieves surprising performance exceeding CNN in some vision tasks. However,

Externí odkaz: http://arxiv.org/abs/2109.13086

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání