Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Sone, Kazoo"'
Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection
Despite significant progress in neural abstractive summarization, recent studies have shown that the current models are prone to generating summaries that are unfaithful to the original context. To address the issue, we study contrast candidate gener
Externí odkaz:
http://arxiv.org/abs/2104.09061
Autor:
Zhu, Wanrong, Qi, Yuankai, Narayana, Pradyumna, Sone, Kazoo, Basu, Sugato, Wang, Xin Eric, Wu, Qi, Eckstein, Miguel, Wang, William Yang
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments. Multiple setups have been proposed, and researchers apply new model architectures or training techniq
Externí odkaz:
http://arxiv.org/abs/2103.16561
Autor:
Zhu, Wanrong, Wang, Xin Eric, Narayana, Pradyumna, Sone, Kazoo, Basu, Sugato, Wang, William Yang
A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings. To do this, it is critical to ensure that our evaluation protocols are correct, and benchmark
Externí odkaz:
http://arxiv.org/abs/2010.03644
Autor:
Zhu, Wanrong, Wang, Xin Eric, Fu, Tsu-Jui, Yan, An, Narayana, Pradyumna, Sone, Kazoo, Basu, Sugato, Wang, William Yang
One of the most challenging topics in Natural Language Processing (NLP) is visually-grounded language understanding and reasoning. Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and na
Externí odkaz:
http://arxiv.org/abs/2007.00229
Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision. However, for applications such as image cluster labeling or web page summarization, su
Externí odkaz:
http://arxiv.org/abs/2006.08686
There is a recent surge of interest in cross-modal representation learning corresponding to images and text. The main challenge lies in mapping images and text to a shared latent space where the embeddings corresponding to a similar semantic concept
Externí odkaz:
http://arxiv.org/abs/1911.05978