MaxwellJryao

Follow

Jiarui Yao MaxwellJryao

Follow

CS PhD at UIUC, Former B.Eng. at IIIS, Tsinghua University

39 followers · 61 following

Tsinghua University

Achievements

Achievements

Highlights

Pro

Pinned Loading

Online-DPO-R1 Online-DPO-R1 Public

Forked from RLHFlow/Online-DPO-R1

Codebase for Iterative DPO Using Rule-based Rewards

Python
RLHFlow/RLHF-Reward-Modeling RLHFlow/RLHF-Reward-Modeling Public

Recipes to train reward model for RLHF.

Python 1.5k 110
shizhediao/Post-Training-Data-Flywheel shizhediao/Post-Training-Data-Flywheel Public

We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.

Python 65 5