Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Date
Next
Previous