Continual pre-training mitigates forgetting in language and vision.
Autor: | Cossu A; University of Pisa, Largo B. Pontecorvo, 3, Pisa, 56127, Italy. Electronic address: andrea.cossu@unipi.it., Carta A; University of Pisa, Largo B. Pontecorvo, 3, Pisa, 56127, Italy., Passaro L; University of Pisa, Largo B. Pontecorvo, 3, Pisa, 56127, Italy., Lomonaco V; University of Pisa, Largo B. Pontecorvo, 3, Pisa, 56127, Italy., Tuytelaars T; KU Leuven, Kasteelpark Arenberg 10, Leuven, 3001, Belgium., Bacciu D; University of Pisa, Largo B. Pontecorvo, 3, Pisa, 56127, Italy. |
---|---|
Jazyk: | angličtina |
Zdroj: | Neural networks : the official journal of the International Neural Network Society [Neural Netw] 2024 Nov; Vol. 179, pp. 106492. Date of Electronic Publication: 2024 Jul 01. |
DOI: | 10.1016/j.neunet.2024.106492 |
Abstrakt: | Pre-trained models are commonly used in Continual Learning to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during Continual Learning. We investigate the characteristics of the Continual Pre-Training scenario, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We introduce an evaluation protocol for Continual Pre-Training which monitors forgetting against a Forgetting Control dataset not present in the continual stream. We disentangle the impact on forgetting of 3 main factors: the input modality (NLP, Vision), the architecture type (Transformer, ResNet) and the pre-training protocol (supervised, self-supervised). Moreover, we propose a Sample-Efficient Pre-training method (SEP) that speeds up the pre-training phase. We show that the pre-training protocol is the most important factor accounting for forgetting. Surprisingly, we discovered that self-supervised continual pre-training in both NLP and Vision is sufficient to mitigate forgetting without the use of any Continual Learning strategy. Other factors, like model depth, input modality and architecture type are not as crucial. Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. (Copyright © 2024 The Author(s). Published by Elsevier Ltd.. All rights reserved.) |
Databáze: | MEDLINE |
Externí odkaz: |