Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Migrate norms and softmax kernels to NVRTC community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3156 opened Jun 30, 2026 by CarlosGomes98 Contributor Loading…
2 of 13 tasks
[PyTorch][torch.compile] Add TensorProto mechanism
#3153 opened Jun 29, 2026 by pggPL Collaborator Loading…
4 of 13 tasks
[PyTorch][torch.compile] Make quantizers opaque value objects
#3152 opened Jun 29, 2026 by pggPL Collaborator Loading…
8 of 13 tasks
Enable FA4 for context-parallel attention
#3149 opened Jun 26, 2026 by sudhakarsingh27 Member Draft
7 of 13 tasks
[Draft] Use vendored cuDNN frontend for Python
#3148 opened Jun 26, 2026 by vcherepanov-nv Collaborator Loading…
1 of 13 tasks
Add MXFP8 support with cuBLASMp community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3145 opened Jun 25, 2026 by almogsegal Contributor Loading…
13 tasks
Add multi_tensor_raw_moments kernel community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3144 opened Jun 25, 2026 by philipcmonk Draft
6 of 13 tasks
docs: document attention backend selection documentation Improvements or additions to documentation
#3142 opened Jun 24, 2026 by sbhavani Collaborator Loading…
4 of 13 tasks
[Common] Fix Build: NCCL EP build to respect MAX_JOBS
#3138 opened Jun 22, 2026 by phu0ngng Collaborator Draft
7 of 13 tasks
[Common] Experimental CuTeDSL MXFP8 backends in C++ via TVM-FFI
#3137 opened Jun 21, 2026 by kainzhong Collaborator Draft
13 tasks
[Common/PyTorch] Grouped-quantize kernels for 1D and 2D FP8 block-scaling 2.17 FP8 MoE performance Performance issues
#3135 opened Jun 17, 2026 by denera Collaborator Loading…
8 of 13 tasks
Single-launch CUTLASS grouped GEMM for per-tensor NVFP4 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3134 opened Jun 17, 2026 by cael-ling Contributor Loading…
9 of 13 tasks
Enable NVFP4 RHT amax for grouped SReLU MLP community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3133 opened Jun 16, 2026 by sraman-rgb Contributor Loading…
13 tasks
[Common] Support scaled & clamped swiglu, srelu for BF16 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3132 opened Jun 16, 2026 by zhongbozhu Collaborator Loading…
13 tasks
feat: add SM_121 (GB10 consumer Blackwell) support for FA4 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3125 opened Jun 12, 2026 by TyGu1 Loading…
docs(readme): update latest news documentation Improvements or additions to documentation
#3121 opened Jun 11, 2026 by sbhavani Collaborator Loading…
6 of 13 tasks
TE EP integration to MoEBlock
#3116 opened Jun 10, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[JAX] Collective Gemm test fixes
#3115 opened Jun 10, 2026 by jberchtold-nvidia Collaborator Loading…
13 tasks
Abstract CUDA hardcodes into configurable te_device_type / te_platform community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3113 opened Jun 10, 2026 by lxd-cumt Loading…
Add entrypoint for flagos multi-backend plugin system community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3107 opened Jun 9, 2026 by lxd-cumt Loading…
Introduce Mega-C++ to reduce CPU overhead community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3099 opened Jun 6, 2026 by zhongbozhu Collaborator Loading…
3 of 17 tasks
increased a bit tolerance for pytorch/distributed/run_numerics.py community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3095 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
ProTip! no:milestone will show everything without a milestone.