qflen

Follow

Kimon N. qflen

Follow

trying to be 100x

11 followers · 2 following

ASML
Amsterdam, NL
kimon.space

Achievements

Achievements

Pinned Loading

nsa-from-scratch nsa-from-scratch Public

From-scratch reimplementation of DeepSeek's Native Sparse Attention (arXiv:2502.11089) in Triton + CUDA Hopper WGMMA. 7.07x faster than FlashAttention-3 at 64k context. Five-model training fleet, p…

Python 6
tinycompress tinycompress Public

Implemented and benchmarked LLM inference compression: int4/int8 quantization, GPTQ-like calibration, int8 KV cache, pruning, distillation, speculative decoding, torch.compile, and ONNX. Every numb…

Python
pytorch pytorch Public

Forked from pytorch/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python
transformers transformers Public

Forked from huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python