Selected Publications

More Publications

Gemma 2: Improving Open Language Models at a Practical Size

We introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameter.

WARP: On the Benefits of Weight Averaged Rewarded Policies

To improve the trade-off between KL and reward during RLHF, we leverage the ability to merge LLMs by weight averaging.

WARM: On the Benefits of Weight Averaged Reward Models

We introduce a new strategy for reward modeling in alignment via RLHF: we merge multiple reward models into one that’s more reliable and robust, and thus mitigates reward hacking.

Diverse and Efficient Ensembling of Deep Networks

During my PhD, I analyzed how ensembling via weight averaging can improve out-of-distribution generalization and alignment. This received the 2024 award of the best ML thesis in France from SSFAM.

Beyond task performance: evaluating and reducing the flaws of large multimodal models with in-context-learning

We investigate large multimodal models and their limitations such as hallucinations and lack of explainability. We then show that multimodal in-context learning can reduce some of these flaws.

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

We introduce rewarded soup, a new strategy to trade-off between multiple rewards when fine-tuning foundation models with RLHF; we first learn one network for each reward, and then linearly interpolate their weights.

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

UnIVAL is a 0.25B-parameter unified model that is multitask pretrained on image and video-text data and target image, video and audio-text downstream tasks.

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

We propose a new fine-tuning strategy that improves OOD generalization in computer vision by recycling and averaging weights specialized on diverse auxiliary tasks.

Diverse Weight Averaging for Out-of-Distribution Generalization

To improve out-of-distribution generalization on DomainBed, we average diverse weights obtained from different training runs; this strategy is motivated by an extension of the bias-variance theory to weight averaging.

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

We propose a new dynamic transformer architecture for continual learning with state-of-the-art performances.


PRAIRIE Artificial Intelligence Summer School: Key Takeaways   Medium
Semi-supervised Learning for Multilingual Sentence Representation   PDF   Video


Deep Learning for Computer Vision
Deep Learning