Skip to content
View MaxwellJryao's full-sized avatar
  • Tsinghua University

Highlights

  • Pro

Block or report MaxwellJryao

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. Online-DPO-R1 Online-DPO-R1 Public

    Forked from RLHFlow/Online-DPO-R1

    Codebase for Iterative DPO Using Rule-based Rewards

    Python

  2. RLHFlow/RLHF-Reward-Modeling RLHFlow/RLHF-Reward-Modeling Public

    Recipes to train reward model for RLHF.

    Python 1.5k 110

  3. shizhediao/Post-Training-Data-Flywheel shizhediao/Post-Training-Data-Flywheel Public

    We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.

    Python 65 5