Skip to content
View qflen's full-sized avatar

Block or report qflen

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. nsa-from-scratch nsa-from-scratch Public

    From-scratch reimplementation of DeepSeek's Native Sparse Attention (arXiv:2502.11089) in Triton + CUDA Hopper WGMMA. 7.07x faster than FlashAttention-3 at 64k context. Five-model training fleet, p…

    Python 6

  2. tinycompress tinycompress Public

    Implemented and benchmarked LLM inference compression: int4/int8 quantization, GPTQ-like calibration, int8 KV cache, pruning, distillation, speculative decoding, torch.compile, and ONNX. Every numb…

    Python

  3. pytorch pytorch Public

    Forked from pytorch/pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

    Python

  4. transformers transformers Public

    Forked from huggingface/transformers

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Python