Skip to content
View Bias92's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Bias92

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. sdpa-attention-benchmark sdpa-attention-benchmark Public

    Benchmark PyTorch SDPA backends (math vs flash) on RTX 4060 Ti with Nsight Systems profiling

    Python 2

  2. flashattn-cuda flashattn-cuda Public

    FlashAttention CUDA kernel implementation from scratch: forward/backward, ncu profiling, 8 optimization attempts (RTX 4060 Ti)

    Cuda 4

  3. fused-qkv-int8-attention fused-qkv-int8-attention Public

    Fused INT8 KV-cache dequantization + FlashAttention-style tiled decode attention CUDA kernel on A100

    Python