Skip to content
View autra-weiliu's full-sized avatar
🏀
I may be slow to respond.
🏀
I may be slow to respond.
  • autra tech
  • Beijing
  • 18:16 (UTC +08:00)

Block or report autra-weiliu

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. Megatron-LM Megatron-LM Public

    Forked from NVIDIA/Megatron-LM

    Ongoing research training transformer models at scale

    Python

  2. how-to-optim-algorithm-in-cuda how-to-optim-algorithm-in-cuda Public

    Forked from BBuf/how-to-optim-algorithm-in-cuda

    how to optimize some algorithm in cuda.

    Cuda

  3. TensorRT-LLM TensorRT-LLM Public

    Forked from NVIDIA/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

    C++

  4. flash-attention flash-attention Public

    Forked from Dao-AILab/flash-attention

    Fast and memory-efficient exact attention

    Python

  5. vllm vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

  6. llama2.c llama2.c Public

    Forked from karpathy/llama2.c

    Inference Llama 2 in one file of pure C

    C