Bias92

Follow

🎯

Focusing

Bias92

🎯

Focusing

Follow

MLSys | HIU undergrad

7 followers · 5 following

Achievements

Achievements

Pinned Loading

sdpa-attention-benchmark sdpa-attention-benchmark Public

Benchmark PyTorch SDPA backends (math vs flash) on RTX 4060 Ti with Nsight Systems profiling

Python 2
flashattn-cuda flashattn-cuda Public

FlashAttention CUDA kernel implementation from scratch: forward/backward, ncu profiling, 8 optimization attempts (RTX 4060 Ti)

Cuda 4
fused-qkv-int8-attention fused-qkv-int8-attention Public

Fused INT8 KV-cache dequantization + FlashAttention-style tiled decode attention CUDA kernel on A100

Python