Highlights
AI
DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference
CUDA Templates and Python DSLs for High-Performance Linear Algebra
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Lightweight coding agent that runs in your terminal
Model Context Protocol Servers
On-device AI across mobile, embedded and edge for PyTorch
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team.
Autonomous coding agent as an SDK, IDE extension, or CLI assistant.
A high-throughput and memory-efficient inference and serving engine for LLMs
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
An elegant PyTorch deep reinforcement learning library.
A framework for efficient model inference with omni-modality models
SGLang is a high-performance serving framework for large language models and multimodal models.
Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
FlashMLA: Efficient Multi-head Latent Attention Kernels
A kernel library written in tilelang
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Open-source, community-driven agent harness






