Publications

Filter by type:

We introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion …

To improve the trade-off between KL and reward during RLHF, we leverage the ability to merge LLMs by weight averaging.

We introduce a new strategy for reward modeling in alignment via RLHF: we merge multiple reward models into one that’s more …

During my PhD, I analyzed how ensembling via weight averaging can improve out-of-distribution generalization and alignment. This …

We investigate large multimodal models and their limitations such as hallucinations and lack of explainability. We then show that …

We introduce rewarded soup, a new strategy to trade-off between multiple rewards when fine-tuning foundation models with RLHF; we first …

UnIVAL is a 0.25B-parameter unified model that is multitask pretrained on image and video-text data and target image, video and …

We propose a new fine-tuning strategy that improves OOD generalization in computer vision by recycling and averaging weights …

To improve out-of-distribution generalization on DomainBed, we average diverse weights obtained from different training runs; this …

We introduce and motivate a new regularization that enforces invariance in the domain-level gradient variances across the different …

We propose a new dynamic transformer architecture for continual learning with state-of-the-art performances.

We introduce a new generalized framework for learning multi-input multi-output subnetworks and study how to best mix the inputs. We …

Driven by arguments from information theory, we introduce a new learning strategy for deep ensembles that increases diversity among …

We detect continuous colors for fashion garments using a new architecture.

We improve performances of object detectors via combining different datasets through soft distillation.

We present a method to learn a visual representation adapted for e-commerce products.