Skip to content
View DavidBellamy's full-sized avatar

Highlights

  • Pro

Block or report DavidBellamy

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DavidBellamy/README.md

Python PyTorch vLLM Kubernetes Docker W&B

Hi, I'm David Bellamy.

I am a Post-Training & Reasoning Researcher and a Harvard PhD Statistician.

I am currently on an engineering sabbatical, building bare-metal reasoning stacks to understand the numerics of Reinforcement Learning for LLMs. My focus is on scaling inference-time compute and designing non-parametric value aggregation methods.


Featured Projects

Project Description Stack
grpo-gsm8k DeepSeek-R1 Reproduction: A bare-metal implementation of Group Relative Policy Optimization (GRPO) on GSM8k. Decoupled training loop (Torch) and inference (vLLM) on distributed GPUs. PyTorch vLLM
suttonbarto RL Theory: Rigorous Python implementations and mathematical proofs for exercises in Sutton & Barto's Reinforcement Learning. Python LaTeX NumPy
Labrador ML4H Best Paper: Code for Limits of Masked Language Modeling, benchmarking Transformers vs. XGBoost on tabular EHR data. TensorFlow

Research Highlights

  • DeepSeek-R1 Replication: Achieved 83.2% Pass@1 on GSM8k using GRPO, matching SFT baselines while recovering reasoning capabilities. Read the W&B Report.
  • Optimization: Implemented length-aware batch packing for SFT, reducing padding overhead from 50% → 21%.
  • Best Paper Award (ML4H 2024): Demonstrated empirical limits of transfer learning in medical tabular data.

Pinned Loading

  1. grpo-gsm8k grpo-gsm8k Public

    RL post-training open LLMs for math reasoning

    Python 2

  2. suttonbarto suttonbarto Public

    Solutions to the exercises in Sutton & Barto's textbook Reinforcement Learning: An Introduction

    Python

  3. labrador labrador Public

    Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data.

    Python 14 3

  4. beamlab-hsph/Neural-Moment-Matching-Regression beamlab-hsph/Neural-Moment-Matching-Regression Public

    Code for our NeurIPS 2022 work titled "Deep Learning Methods for Proximal Inference via Maximum Moment Restriction"

    Python 4 3