Zobrazeno 1 - 10
of 13 240
pro vyhledávání: '"Donahue, P."'
We demonstrate that vision language models (VLMs) are capable of recognizing the content in audio recordings when given corresponding spectrogram images. Specifically, we instruct VLMs to perform audio classification tasks in a few-shot setting by pr
Externí odkaz:
http://arxiv.org/abs/2411.12058
We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware. Creating this demo involved porting the Anticipatory Music Transformer, a large language model (
Externí odkaz:
http://arxiv.org/abs/2411.09625
We propose an efficient workflow for high-quality offline alignment of in-the-wild performance audio and corresponding sheet music scans (images). Recent work on audio-to-score alignment extends dynamic time warping (DTW) to be theoretically able to
Externí odkaz:
http://arxiv.org/abs/2411.07428
The goal of multi-objective optimization (MOO) is to learn under multiple, potentially conflicting, objectives. One widely used technique to tackle MOO is through linear scalarization, where one fixed preference vector is used to combine the objectiv
Externí odkaz:
http://arxiv.org/abs/2410.21764
Autor:
Turner, David J., Pilling, Jessica E., Donahue, Megan, Giles, Paul A., Romer, Kathy, Gupta, Agrim, Wallage, Toby, Wang, Ray
We introduce a new, open-source, Python module for the acquisition and processing of archival data from many X-ray telescopes - Democratising Archival X-ray Astronomy (hereafter referred to as DAXA). Our software is built to increase access to, and u
Externí odkaz:
http://arxiv.org/abs/2410.11954
Autor:
Carr, Chris G., Donahue, Carly M., Viens, Loic, Beardslee, Luke B., McGhee, Elisa A., Danielson, Lisa R.
On 24 September 2023, the Origins, Spectral Interpretation, Resource Identification, and Security Regolith Explorer (OSIRIS-REx) Sample Return Capsule entered the Earth's atmosphere after successfully collecting samples from an asteroid. The known tr
Externí odkaz:
http://arxiv.org/abs/2410.03034
Music foundation models possess impressive music generation capabilities. When people compose music, they may infuse their understanding of music into their work, by using notes and intervals to craft melodies, chords to build progressions, and tempo
Externí odkaz:
http://arxiv.org/abs/2410.00872
There has been a surge of interest in language model agents that can navigate virtual environments such as the web or desktop. To navigate such environments, agents benefit from information on the various elements (e.g., buttons, text, or images) pre
Externí odkaz:
http://arxiv.org/abs/2409.12089
Autor:
Donahue, Evan
Over the past decade, reactive frameworks and languages have become the dominant programming paradigm in front-end web development. In this paradigm, user actions change application state, and those changes propagate reactively to derived state and t
Externí odkaz:
http://arxiv.org/abs/2408.17044
Autor:
Ma, Yinghao, Øland, Anders, Ragni, Anton, Del Sette, Bleiz MacSen, Saitis, Charalampos, Donahue, Chris, Lin, Chenghua, Plachouras, Christos, Benetos, Emmanouil, Shatri, Elona, Morreale, Fabio, Zhang, Ge, Fazekas, György, Xia, Gus, Zhang, Huan, Manco, Ilaria, Huang, Jiawen, Guinot, Julien, Lin, Liwei, Marinelli, Luca, Lam, Max W. Y., Sharma, Megha, Kong, Qiuqiang, Dannenberg, Roger B., Yuan, Ruibin, Wu, Shangda, Wu, Shih-Lun, Dai, Shuqi, Lei, Shun, Kang, Shiyin, Dixon, Simon, Chen, Wenhu, Huang, Wenhao, Du, Xingjian, Qu, Xingwei, Tan, Xu, Li, Yizhi, Tian, Zeyue, Wu, Zhiyong, Wu, Zhizheng, Ma, Ziyang, Wang, Ziyu
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models
Externí odkaz:
http://arxiv.org/abs/2408.14340