Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Lai, Zihang"'
Autor:
Lai, Zihang
Open-vocabulary semantic segmentation models aim to accurately assign a semantic label to each pixel in an image from a set of arbitrary open-vocabulary texts. In order to learn such pixel-level alignment, current approaches typically rely on a combi
Externí odkaz:
http://arxiv.org/abs/2401.12217
Recently, Neural Radiance Fields (NeRF) has shown promising performances on reconstructing 3D scenes and synthesizing novel views from a sparse set of 2D images. Albeit effective, the performance of NeRF is highly influenced by the quality of trainin
Externí odkaz:
http://arxiv.org/abs/2209.08546
Autor:
Han, Yizeng, Pu, Yifan, Lai, Zihang, Wang, Chaofei, Song, Shiji, Cao, Junfen, Huang, Wenhui, Deng, Chao, Huang, Gao
Early exiting is an effective paradigm for improving the inference efficiency of deep networks. By constructing classifiers with varying resource demands (the exits), such networks allow easy samples to be output at early exits, removing the need for
Externí odkaz:
http://arxiv.org/abs/2209.08310
The paper presents a scalable approach for learning spatially distributed visual representations over individual tokens and a holistic instance representation simultaneously. We use self-attention blocks to represent spatially distributed tokens, fol
Externí odkaz:
http://arxiv.org/abs/2206.04667
Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain, where only unlabeled samples are given. Current UDA approaches learn domain-invariant features by aligning source and target featu
Externí odkaz:
http://arxiv.org/abs/2202.06687
Autor:
Wang, Yulin, Yue, Yang, Lin, Yuanze, Jiang, Haojun, Lai, Zihang, Kulikov, Victor, Orlov, Nikita, Shi, Humphrey, Huang, Gao
Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy. As a representative work, the adaptive focus method (AdaFocus) has achieved a favorable trade-off between
Externí odkaz:
http://arxiv.org/abs/2112.14238
A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos in a self-supervised manner. Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby vide
Externí odkaz:
http://arxiv.org/abs/2110.02951
The ability to find correspondences in visual data is the essence of most computer vision tasks. But what are the right correspondences? The task of visual correspondence is well defined for two different images of same object instance. In case of tw
Externí odkaz:
http://arxiv.org/abs/2109.01097
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervis
Externí odkaz:
http://arxiv.org/abs/2002.07793
Autor:
Lai, Zihang, Xie, Weidi
The objective of this paper is self-supervised learning of feature embeddings that are suitable for matching correspondences along the videos, which we term correspondence flow. By leveraging the natural spatial-temporal coherence in videos, we propo
Externí odkaz:
http://arxiv.org/abs/1905.00875