Alexandre Ramé

Deep learning scientist

Research scientist at Google DeepMind, in Paris. Post-training — distillation, RL, merging and reasoning — to improve the alignment of Gemma LLMs with the world in all its diversity. My thesis received the 2024 SSFAM award of the best French PhD in ML.

Education

PhD in Deep Learning, 2020 - 2023

Sorbonne University (ISIR).
Master of Science in Applied Mathematics and Operations Research, 2014 - 2015

Columbia University
Diplôme d’ Ingénieur Polytechnicien, 2011 - 2014

Ecole Polytechnique

Experience

Research Scientist, Mar 2024 -

Google DeepMind
Student Researcher, Oct 2023 - Jan 2024

Google DeepMind
Research Scientist Intern, Sep 2022 - Feb 2023

FAIR Meta, Fairness and Robustness Team
Research Scientist in Deep Learning, 2016 - 2020

Heuritech

Selected Publications

More Publications

Gemma 3 Technical Report

Gemma Team

arXiv

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. In particular, our novel post-training recipe significantly improves the math, chat, …

PDF

On Teacher Hacking in Language Model Distillation

Daniil Tiapkin, Daniele Calandriello, Johan Ferret, Sarah Perrin, Nino Vieillard, Alexandre Ramé, Mathieu Blondel

arXiv

We study teacher hacking: does over-optimization of the distillation objective harm the ground-truth performance?

PDF

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Arthur Douillard, Yanislav Donchev, Keith Rush, Satyen Kale, Zachary Charles, Zachary Garrett, Gabriel Teston, Dave Lacey, Ross McIlroy, Jiajun Shen, Alexandre Ramé, Arthur Szlam, Marc’Aurelio Ranzato, Paul Barham

arXiv

We improve DiLoCo in three ways. First, we synchronize only subsets of parameters in sequence. Second, we allow workers to continue training while synchronizing. Third, we quantize the data exchanged by workers.

PDF

Diversity-Rewarded CFG Distillation

Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin, Alexandre Ramé

ICLR 2025

we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations using RL and weight averaging.

PDF

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team

arXiv

We introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameter.

PDF

WARP: On the Benefits of Weight Averaged Rewarded Policies

Alexandre Ramé, Johan Ferret, Nino Vieillard, Robert Dadashi, Léonard Hussenot, Pierre-Louis Cedoz, Pier Giuseppe Sessa, Sertan Girgin, Arthur Douillard, Olivier Bachem

arXiv

To improve the trade-off between KL and reward during RLHF, we leverage the ability to merge LLMs by weight averaging.

PDF Slides

WARM: On the Benefits of Weight Averaged Reward Models

Alexandre Ramé, Nino Vieillard, Léonard Hussenot, Robert Dadashi, Geoffrey Cideron, Olivier Bachem, Johan Ferret

ICML 2024

We introduce a new strategy for reward modeling in alignment via RLHF: we merge multiple reward models into one that’s more reliable and robust, and thus mitigates reward hacking.

PDF Poster Slides

Diverse and Efficient Ensembling of Deep Networks

Alexandre Ramé

PhD 2023

During my PhD, I analyzed how ensembling via weight averaging can improve out-of-distribution generalization and alignment. This received the 2024 award of the best ML thesis in France from SSFAM.

PDF Slides Video

Beyond task performance: evaluating and reducing the flaws of large multimodal models with in-context-learning

Mustafa Shukor, Alexandre Ramé, Corentin Dancette, Matthieu Cord

ICLR 2024

We investigate large multimodal models and their limitations such as hallucinations and lack of explainability. We then show that multimodal in-context learning can reduce some of these flaws.

PDF Code Project

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Alexandre Ramé, Guillaume Couairon, Mustafa Shukor, Corentin Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord

NeurIPS 2023

We introduce rewarded soup, a new strategy to trade-off between multiple rewards when fine-tuning foundation models with RLHF; we first learn one network for each reward, and then linearly interpolate their weights.

PDF Code Project Poster Slides

Talks/Posts

PRAIRIE Artificial Intelligence Summer School: Key Takeaways Medium

July 2018 · Grenoble

Semi-supervised Learning for Multilingual Sentence Representation PDF Video

March 2016 · Sorbonne / Heuritech

Teaching

Deep Learning for Computer Vision

Fall 2020/ Fall 2021 · Sorbonne University · Teacher Assistant

Deep Learning

Fall 2017/Fall 2018 · Master Data Science l'X-Paris Saclay · Teacher Assistant

Mathematics

Nov 2011 - Mar 2012 · Fondation d'Auteuil · Volunteer Teacher and Youth Leader