Publications | Alexandre Ramé

Gemma 3 Technical Report

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion …

Gemma Team

PDF

On Teacher Hacking in Language Model Distillation

We study teacher hacking: does over-optimization of the distillation objective harm the ground-truth performance?

Daniil Tiapkin, Daniele Calandriello, Johan Ferret, Sarah Perrin, Nino Vieillard, Alexandre Ramé, Mathieu Blondel

PDF

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

We improve DiLoCo in three ways. First, we synchronize only subsets of parameters in sequence. Second, we allow workers to continue …

Arthur Douillard, Yanislav Donchev, Keith Rush, Satyen Kale, Zachary Charles, Zachary Garrett, Gabriel Teston, Dave Lacey, Ross McIlroy, Jiajun Shen, Alexandre Ramé, Arthur Szlam, Marc’Aurelio Ranzato, Paul Barham

PDF

Diversity-Rewarded CFG Distillation

we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its …

Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin, Alexandre Ramé

PDF

Gemma 2: Improving Open Language Models at a Practical Size

We introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion …

Gemma Team

PDF

WARP: On the Benefits of Weight Averaged Rewarded Policies

To improve the trade-off between KL and reward during RLHF, we leverage the ability to merge LLMs by weight averaging.

Alexandre Ramé, Johan Ferret, Nino Vieillard, Robert Dadashi, Léonard Hussenot, Pierre-Louis Cedoz, Pier Giuseppe Sessa, Sertan Girgin, Arthur Douillard, Olivier Bachem

PDF Slides

WARM: On the Benefits of Weight Averaged Reward Models

We introduce a new strategy for reward modeling in alignment via RLHF: we merge multiple reward models into one that’s more …

Alexandre Ramé, Nino Vieillard, Léonard Hussenot, Robert Dadashi, Geoffrey Cideron, Olivier Bachem, Johan Ferret

PDF Poster Slides

Diverse and Efficient Ensembling of Deep Networks

During my PhD, I analyzed how ensembling via weight averaging can improve out-of-distribution generalization and alignment. This …

Alexandre Ramé

PDF Slides Video

Beyond task performance: evaluating and reducing the flaws of large multimodal models with in-context-learning

We investigate large multimodal models and their limitations such as hallucinations and lack of explainability. We then show that …

Mustafa Shukor, Alexandre Ramé, Corentin Dancette, Matthieu Cord

PDF Code Project

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

We introduce rewarded soup, a new strategy to trade-off between multiple rewards when fine-tuning foundation models with RLHF; we first …

Alexandre Ramé, Guillaume Couairon, Mustafa Shukor, Corentin Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord

PDF Code Project Poster Slides

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

UnIVAL is a 0.25B-parameter unified model that is multitask pretrained on image and video-text data and target image, video and …

Mustafa Shukor, Corentin Dancette, Alexandre Ramé, Matthieu Cord

PDF Code Project

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

We propose a new fine-tuning strategy that improves OOD generalization in computer vision by recycling and averaging weights …

Alexandre Ramé, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, Léon Bottou, David Lopez-Paz

PDF Code Poster Slides

Diverse Weight Averaging for Out-of-Distribution Generalization

To improve out-of-distribution generalization on DomainBed, we average diverse weights obtained from different training runs; this …

Alexandre Ramé, Matthieu Kirchmeyer, Thibaud Rahier, Alain Rakotomamonjy, Patrick Gallinari, Matthieu Cord

PDF Code Poster Slides

Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

We introduce and motivate a new regularization that enforces invariance in the domain-level gradient variances across the different …

Alexandre Ramé, Corentin Dancette, Matthieu Cord

PDF Code Poster Slides

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

We propose a new dynamic transformer architecture for continual learning with state-of-the-art performances.

Arthur Douillard, Alexandre Ramé, Guillaume Couairon, Matthieu Cord

PDF Code

MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks

We introduce a new generalized framework for learning multi-input multi-output subnetworks and study how to best mix the inputs. We …

Alexandre Ramé, Rémy Sun, Matthieu Cord

PDF Code Poster Slides

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

Driven by arguments from information theory, we introduce a new learning strategy for deep ensembles that increases diversity among …

Alexandre Ramé, Matthieu Cord

PDF Poster Slides Video OpenReview

CORE: Color Regression for Multiple Colors Fashion Garments

We detect continuous colors for fashion garments using a new architecture.

Alexandre Ramé, Arthur Douillard, Charles Ollion

PDF Poster

OMNIA Faster R-CNN: Detection in the Wild through Dataset Merging and Soft Distillation

We improve performances of object detectors via combining different datasets through soft distillation.

Alexandre Ramé, Emilien Garreau, Hedi Ben-Younes, Charles Ollion

PDF Video

Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction

We present a method to learn a visual representation adapted for e-commerce products.

Charles Corbiere, Hedi Ben-Younes, Alexandre Ramé, Charles Ollion

PDF