Skip to content
View yuguo-Jack's full-sized avatar
😮
😮

Block or report yuguo-Jack

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. mori mori Public

    Forked from ROCm/mori

    Modular RDMA Interface

    C++

  2. composable_kernel composable_kernel Public

    Forked from ROCm/composable_kernel

    Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

    C++

  3. cuda-optimized-skill cuda-optimized-skill Public

    Forked from KernelFlow-ops/cuda-optimized-skill

    A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It helps improve custom GPU operators with reproducible workflo…

    Python 3

  4. TileKernels TileKernels Public

    Forked from deepseek-ai/TileKernels

    A kernel library written in tilelang

    Python

  5. DeepEP DeepEP Public

    Forked from deepseek-ai/DeepEP

    DeepEP: an efficient expert-parallel communication library

    Cuda

  6. DeepGEMM DeepGEMM Public

    Forked from deepseek-ai/DeepGEMM

    DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

    Assembly 1