Zobrazeno 1 - 10
of 102
pro vyhledávání: '"Ishii Masato"'
Autor:
Jha, Saurav, Yang, Shiqi, Ishii, Masato, Zhao, Mengjie, Simon, Christian, Mirza, Muhammad Jehanzeb, Gong, Dong, Yao, Lina, Takahashi, Shusuke, Mitsufuji, Yuki
Personalized text-to-image diffusion models have grown popular for their ability to efficiently acquire a new concept from user-defined text descriptions and a few images. However, in the real world, a user may wish to personalize a model on multiple
Externí odkaz:
http://arxiv.org/abs/2410.00700
In this work, we build a simple but strong baseline for sounding video generation. Given base diffusion models for audio and video, we integrate them with additional modules into a single model and train it to make the model jointly generate audio an
Externí odkaz:
http://arxiv.org/abs/2409.17550
In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides each single-modal
Externí odkaz:
http://arxiv.org/abs/2405.17842
Autor:
Yang, Shiqi, Zhong, Zhi, Zhao, Mengjie, Takahashi, Shusuke, Ishii, Masato, Shibuya, Takashi, Mitsufuji, Yuki
In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2i
Externí odkaz:
http://arxiv.org/abs/2405.14598
We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Imag
Externí odkaz:
http://arxiv.org/abs/2303.15780
We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses search space
Externí odkaz:
http://arxiv.org/abs/2303.13121
Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise gui
Externí odkaz:
http://arxiv.org/abs/2212.02024
Autor:
Ishii, Masato
We propose a novel semi-supervised learning (SSL) method that adopts selective training with pseudo labels. In our method, we generate hard pseudo-labels and also estimate their confidence, which represents how likely each pseudo-label is to be corre
Externí odkaz:
http://arxiv.org/abs/2103.08193
Transformer architectures have brought about fundamental changes to computational linguistic field, which had been dominated by recurrent neural networks for many years. Its success also implies drastic changes in cross-modal tasks with language and
Externí odkaz:
http://arxiv.org/abs/2103.04037
Autor:
Narihira, Takuya, Alonsogarcia, Javier, Cardinaux, Fabien, Hayakawa, Akio, Ishii, Masato, Iwaki, Kazunori, Kemp, Thomas, Kobayashi, Yoshiyuki, Mauch, Lukas, Nakamura, Akira, Obuchi, Yukio, Shin, Andrew, Suzuki, Kenji, Tiedmann, Stephen, Uhlich, Stefan, Yashima, Takuya, Yoshiyama, Kazuki
While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between
Externí odkaz:
http://arxiv.org/abs/2102.06725