Výsledky vyhledávání - "dense video captioning"

Akademický článek

Dense video captioning based on local attention

Autor: Yong Qian, Yingchi Mao, Zhihao Chen, Chang Li, Olano Teah Bloh, Qian Huang

Publikováno v: IET Image Processing, Vol 17, Iss 9, Pp 2673-2685 (2023)

Abstract Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resu

Externí odkaz: https://doaj.org/article/81daafde3214439294b202442683190a

Zobrazit plný text záznamu

Akademický článek

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods

Autor: Mohammad Saif Wajid, Hugo Terashima‐Marin, Peyman Najafirad, Mohd Anas Wajid

Publikováno v: Engineering Reports, Vol 6, Iss 1, Pp n/a-n/a (2024)

Abstract Generating an image/video caption has always been a fundamental problem of Artificial Intelligence, which is usually performed using the potential of Deep Learning Methods, Computer Vision, Knowledge Graphs, and Natural Language Processing (

Externí odkaz: https://doaj.org/article/c2188da8af704ceaaa93d5b7fd089b82

Zobrazit plný text záznamu

Akademický článek

Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph

Autor: Shixing Han, Jin Liu, Jinyingming Zhang, Peizhu Gong, Xiliang Zhang, Huihua He

Publikováno v: Complex & Intelligent Systems, Vol 9, Iss 5, Pp 4995-5012 (2023)

Abstract Dense video captioning (DVC) aims at generating description for each scene in a video. Despite attractive progress for this task, previous works usually only concentrate on exploiting visual features while neglecting audio information in the

Externí odkaz: https://doaj.org/article/c077acec0ce14cd09db77ae66c88199a

Zobrazit plný text záznamu

Akademický článek

PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach

Autor: Wangyu Choi, Jiasi Chen, Jongwon Yoon

Publikováno v: IEEE Access, Vol 11, Pp 128162-128174 (2023)

In recent times, there has been a notable increase in efforts to simultaneously comprehend vision and language, driven by the availability of video-related datasets and advancements in language models within the domain of natural language processing.

Externí odkaz: https://doaj.org/article/f87ea683316e4289a61f19cf9550ca74

Zobrazit plný text záznamu

Akademický článek

Step by Step: A Gradual Approach for Dense Video Captioning

Autor: Wangyu Choi, Jiasi Chen, Jongwon Yoon

Publikováno v: IEEE Access, Vol 11, Pp 51949-51959 (2023)

Dense video captioning aims to localize and describe events for storytelling in untrimmed videos. It is a conceptually very challenging task that requires concise, relevant, and coherent captioning based on high-quality event localization. Unlike sim

Externí odkaz: https://doaj.org/article/ec03f9c784474968b92bac91ed86dcf4

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Autor: Yang, Antoine, Nagrani, Arsha, Seo, Paul Hongsuck, Miech, Antoine, Pont-Tuset, Jordi, Laptev, Ivan, Sivic, Josef, Schmid, Cordelia

Publikováno v: CVPR 2023-IEEE/CVF Conference on Computer Vision and Pattern Recognition
CVPR 2023-IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2023, Vancouver, Canada

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale. The Vid2Seq architecture augments a language model with special time tokens, allowing it t

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e67c8cfe1931fcccfa44cf37962312ff

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání