Meta-learning for real-world class incremental learning: a transformer-based approach

Autor: Sandeep Kumar, Amit Sharma, Vikrant Shokeen, Ahmad Taher Azar, Syed Umar Amin, Zafar Iqbal Khan
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Scientific Reports, Vol 14, Iss 1, Pp 1-20 (2024)
Druh dokumentu: article
ISSN: 2045-2322
DOI: 10.1038/s41598-024-71125-8
Popis: Abstract Modern natural language processing (NLP) state-of-the-art (SoTA) deep learning (DL) models have hundreds of millions of parameters, making them extremely complex. Large datasets are required for training these models, and while pretraining has reduced this requirement, human-labelled datasets are still necessary for fine-tuning. Few-shot learning (FSL) techniques, such as meta-learning, try to train models from smaller datasets to mitigate this cost. However, the tasks used to evaluate these meta-learners frequently diverge from the problems in the real world that they are meant to resolve. This work aims to apply meta-learning to a problem that is more pertinent to the real world: class incremental learning (IL). In this scenario, after completing its training, the model learns to classify newly introduced classes. One unique quality of meta-learners is that they can generalise from a small sample size to classes that have never been seen before, which makes them especially useful for class incremental learning (IL). The method describes how to emulate class IL using proxy new classes. This method allows a meta-learner to complete the task without the need for retraining. To generate predictions, the transformer-based aggregation function in a meta-learner that modifies data from examples across all classes has been proposed. The principal contributions of the model include concurrently considering the entire support and query sets, and prioritising attention to crucial samples, such as the question, to increase the significance of its impact during inference. The outcomes demonstrate that the model surpasses prevailing benchmarks in the industry. Notably, most meta-learners demonstrate significant generalisation in the context of class IL even without specific training for this task. This paper establishes a high-performing baseline for subsequent transformer-based aggregation techniques, thereby emphasising the practical significance of meta-learners in class IL.
Databáze: Directory of Open Access Journals