Alexandre Ramé
Home
Publications
Talks/Posts
Teaching
Books
On Teacher Hacking in Language Model Distillation
Daniil Tiapkin
,
Daniele Calandriello
,
Johan Ferret
,
Sarah Perrin
,
Nino Vieillard
,
Alexandre Ramé
,
Mathieu Blondel
4 February, 2025
Date
February, 2025
Links
PDF
Cite
Next
Gemma 3 Technical Report
Previous
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
Cite
×