1 article on synthetic data.
RLHF is the right idea with the wrong implementation cost for most teams. DPO flips the math — here's how I'd use it to align a healthcare AI model on clinician feedback without burning a month on reward model engineering.