How much does a language model forget when finetuned on new tasks? We show both model size and optimization matter and forgetting can be nearly eliminated with self-generated replay! arxiv.org/abs/2605.26097 w/Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara, and Pavel Izma…
Who's Who of AI
Andrew Gordon Wilson
Machine Learning Professor
https://cims.nyu.edu/~andrewgw
What they're sharing
[2605.26097] Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay arxiv.org