Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

Autor: Jurenka, Irina, Kunesch, Markus, McKee, Kevin R., Gillick, Daniel, Zhu, Shaojian, Wiltberger, Sara, Phal, Shubham Milind, Hermann, Katherine, Kasenberg, Daniel, Bhoopchand, Avishkar, Anand, Ankit, Pîslar, Miruna, Chan, Stephanie, Wang, Lisa, She, Jennifer, Mahmoudieh, Parsa, Rysbek, Aliya, Ko, Wei-Jen, Huber, Andrea, Wiltshire, Brett, Elidan, Gal, Rabin, Roni, Rubinovitz, Jasmin, Pitaru, Amit, McAllister, Mac, Wilkowski, Julia, Choi, David, Engelberg, Roee, Hackmon, Lidan, Levin, Adva, Griffin, Rachel, Sears, Michael, Bar, Filip, Mesar, Mia, Jabbour, Mana, Chaudhry, Arslan, Cohan, James, Thiagarajan, Sridhar, Levine, Nir, Brown, Ben, Gorur, Dilan, Grant, Svetlana, Hashimshoni, Rachel, Weidinger, Laura, Hu, Jieru, Chen, Dawn, Dolecki, Kuba, Akbulut, Canfer, Bileschi, Maxwell, Culp, Laura, Dong, Wen-Xin, Marchal, Nahema, Van Deman, Kelsie, Misra, Hema Bajaj, Duah, Michael, Ambar, Moran, Caciularu, Avi, Lefdal, Sandra, Summerfield, Chris, An, James, Kamienny, Pierre-Alexandre, Mohdi, Abhinit, Strinopoulous, Theofilos, Hale, Annie, Anderson, Wayne, Cobo, Luis C., Efron, Niv, Ananda, Muktha, Mohamed, Shakir, Heymans, Maureen, Ghahramani, Zoubin, Matias, Yossi, Gomes, Ben, Ibrahim, Lila
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily due to the difficulties with verbalising pedagogical intuitions into gen AI prompts and the lack of good evaluation practices, reinforced by the challenges in defining excellent pedagogy. Here we present our work collaborating with learners and educators to translate high level principles from learning science into a pragmatic set of seven diverse educational benchmarks, spanning quantitative, qualitative, automatic and human evaluations; and to develop a new set of fine-tuning datasets to improve the pedagogical capabilities of Gemini, introducing LearnLM-Tutor. Our evaluations show that LearnLM-Tutor is consistently preferred over a prompt tuned Gemini by educators and learners on a number of pedagogical dimensions. We hope that this work can serve as a first step towards developing a comprehensive educational evaluation framework, and that this can enable rapid progress within the AI and EdTech communities towards maximising the positive impact of gen AI in education.
Databáze: arXiv