Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Baade, Alan"'
Language models require tokenized inputs. However, tokenization strategies for continuous data like audio and vision are often based on simple heuristics such as fixed sized convolutions or discrete clustering, which do not necessarily align with the
Externí odkaz:
http://arxiv.org/abs/2410.04029
In this paper, we propose a simple yet powerful improvement over the recent Self-Supervised Audio Spectrogram Transformer (SSAST) model for speech and audio classification. Specifically, we leverage the insight that the SSAST uses a very high masking
Externí odkaz:
http://arxiv.org/abs/2203.16691