Skip to content
View LessUp's full-sized avatar
  • shenzhen
  • 18:13 (UTC +08:00)

Block or report LessUp

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
LessUp/README.md
Static title

聚焦 AI 基础设施、CUDA Kernel 与高性能系统工程

🔬 Focus: AI Infrastructure · CUDA Kernels · LLM Inference · HPC Systems
🌱 Currently: Building high-throughput inference pipelines and GPU-first systems
🤝 Open to: AI infrastructure, performance engineering, research collaboration, and open-source collaboration


Followers   Stars   Views



Profile  Selected Work  Background  Stack  Signals  Connect



👨‍💻 About Me / 关于我

Top Languages

I build AI infrastructure and GPU-first high-performance systems with C++/CUDA, Python, and Go. 主要聚焦 AI 基础设施、GPU 算子优化与高性能系统工程实践。

  • 🔥 GPU Kernel Engineering — CUDA/Triton kernels for FlashAttention, GEMM, quantization, and memory-aware operator design
    GPU 算子工程 — FlashAttention、GEMM、量化与内存感��算子设计
  • 🧠 AI Inference Systems — lightweight LLM runtimes, KV Cache, W8A16/FP8 quantization, and inference path optimization
    AI 推理系统 — 轻量 LLM 运行时、KV Cache、量化方案与推理路径优化
  • High-Performance Computing — simulation, rendering, and image-processing pipelines tuned for throughput and scalability
    高性能计算 — 面向吞吐与可扩展性的仿真、渲染与图像处理流水线
  • 🌐 Real-time Systems — RTC signaling, streaming applications, and digital human platforms with system-level integration
    实时系统 — RTC 信令、流媒体应用与数字人平台的系统级集成

Currently / 当前关注: inference acceleration, kernel fusion, and end-to-end GPU system design.
推理加速、算子融合与端到端 GPU 系统设计。



🚀 Selected Work / 项目全景

Featured Projects / 核心项目 — Start here for the quickest overview of my work in bioinformatics, HPC, AI inference, and developer tooling.
如果你想快速判断我的技术重心与代表作,建议先看下面 4 个项目。
Best entry points for collaboration, hiring conversations, and technical review.

High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM algorithms.
高性能 FASTQ 压缩:3.97x 压缩比,O(1) 随机访问,C++23 + oneTBB。

Stars C++23

High-performance FASTQ QC toolkit (stat/filter/trim); zero-copy I/O, TBB pipeline, C++23.
高性能 FASTQ 质控工具:零拷贝 I/O、TBB 流水线、C++23。

Stars C++23

End-to-end Metagenomic Intelligence and Comprehensive Omics Suite (Mammoth Cup 2024)
端到端宏基因组综合分析平台(猛犸杯 2024 参赛项目)

Stars R

Systematic knowledge base for bioinformatics (Chinese community)
面向中文社区的生物信息学体系化知识库

Stars MDX

🧬 Bioinformatics & Genomics / 生物信息学

Systematic knowledge base for bioinformatics (Chinese community)
面向中文社区的生物信息学体系化知识库

MDX Bioinformatics

End-to-end Metagenomic Intelligence and Comprehensive Omics Suite
端到端宏基因组综合分析平台(猛犸杯 2024)

R Metagenomics

High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM.
高性能 FASTQ 压缩:3.97x 压缩比,O(1) 随机访问

C++23 oneTBB

High-performance FASTQ QC toolkit (stat/filter/trim); zero-copy I/O, TBB pipeline, C++23.
高性能 FASTQ 质控工具:零拷贝 I/O、TBB 流水线、C++23

C++23 Zero-Copy

Curated bioinformatics algorithms knowledge base with complexity analysis, CLI tools, and bilingual docs.
精选生物信息学算法知识库,含复杂度分析、CLI 维护工具与双语文档

Python Algorithms

⚡ CUDA & HPC / 高性能计算

Bilingual CUDA SGEMM optimization tutorial, from naive kernels to Tensor Core WMMA.
双语 CUDA SGEMM 优化教程与参考实现,从朴素内核到 Tensor Core WMMA

CUDA Tensor Core

High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization.
高性能 C++ 优化指南,含无锁数据结构、SIMD 和内存优化示例

C++17 SIMD

Header-only C++23 bit manipulation library with SIMD acceleration (SSE2/AVX2/AVX-512/NEON).
仅头文件 C++23 位操作库,支持 SIMD 加速

C++23 SIMD

Classic lossless compression algorithms in C++17, Go, and Rust with cross-language binary verification.
经典无损压缩算法,支持 C++17、Go 和 Rust,跨语言二进制验证

Go Rust

HPC textbooks covering MPI, OpenMP, CUDA, and Scientific Computing (CC-BY 4.0)
《高性能计算艺术》系列中文翻译,涵盖 MPI、OpenMP、CUDA 与科学计算

MPI OpenMP

Compression Knowledge Base: Algorithm Theory, Performance Benchmarks & C++ Examples
压缩算法知识库:原理、性能基准与 C++ 示例

C++17 Algorithms

🤖 AI & Developer Tooling / AI 与开发者工具

Cursor AI 编程规则精选集 | 132+ 规则,覆盖前端/后端/AI/DevOps 等 32 个领域

Stars JavaScript

Archive-grade .mdc rule library for Cursor AI — 26 production-ready rules
归档级 Cursor .mdc 规则库 — 26 个生产就绪规则,低漂移设计

Stars JavaScript

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
Claude Skills 精选列表:定制 Claude AI 工作流的技能、资源和工具合集

Python Claude

Offline-first bookmark cleaner: rules-first, ML-assisted, LLM-optional
智能书签清理与分类:规则+ML+LLM(可选)

Python ML

Multi-Model Real-Time Visual Recognition System with REST API and WebSocket Streaming
多模型实时视觉识别系统,提供 REST API 和 WebSocket 流式推理

Python YOLOv8

Privacy-first diagram editor with local WASM rendering, Kroki full mode, sharing, and export.
隐私优先的图表编辑器:本地 WASM 渲染、Kroki 全模式、分享与导出

TypeScript WASM

🌐 Applications / 应用项目

Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready.
浏览器原生 3D 数字人引擎,支持���音、视觉与对话。零配置、离线可用。

Stars TypeScript

Lightweight WebRTC Demo: Go Signaling Server + Vanilla JavaScript Client, OpenSpec-Driven
轻量级 WebRTC 演示:Go 信令服务 + 原生 JavaScript 客户端,OpenSpec 驱动开发

Go WebRTC

Browser-based memory training PWA with FSRS-4.5 spaced repetition, N-back training, and adaptive difficulty
基于 FSRS-4.5 间隔重复、N-back 训练和自适应难度的浏览器记忆力训练 PWA

JavaScript PWA


🎓 Background & Experience / 教育与经历

🎓 Education

Xidian University Xidian University

Computer Science related background. / 计算机科学相关背景

💼 Experience

Mindray Mindray · ZEGO ZEGO · BGI BGI

Engineering across medical imaging, RTC systems, and genomic-scale data workflows. / 覆盖医疗影像、实时音视频系统与基因数据工程。


🛠️ Tech Stack / 技术栈

聚焦与核心项目强相关的技术:AI Infrastructure · CUDA Kernel Engineering · LLM Inference · HPC Systems

Category Technologies
AI Infrastructure AI Infrastructure
CUDA Kernel Engineering CUDA C++ Tensor Core WMMA FlashAttention
LLM Inference Optimization Triton TensorRT KV Cache W8A16 / FP8
HPC Performance Engineering SIMD Intel oneTBB Memory Optimization Performance Profiling

📊 Signals & Activity / 数据概览

LessUp's GitHub stats

GitHub Activity Graph (public, UTC)
Public data only · UTC aggregation · may lag a few hours / 仅统计公开贡献、按 UTC 聚合,可能有数小时延迟

📫 Collaboration & Contact / 联系方式

Reach out if you're building AI infrastructure, inference acceleration, GPU systems, or performance-critical tooling.
欢迎联系我交流 AI 基础设施、推理加速、GPU 系统,以及对性能敏感的工程项目。
Open to technical collaboration, engineering roles, research discussions, and thoughtful open-source work.
Email   GitHub

Footer

Pinned Loading

  1. awesome-cursorrules-zh awesome-cursorrules-zh Public

    Cursor AI 编程规则精选集 | 132+ 规则,覆盖前端/后端/AI/DevOps 等 32 个领域

    JavaScript 209 29

  2. meta-human meta-human Public

    Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready AI avatar platform. | 浏览器原生 3D 数字人引擎,支持语音、视觉与对话。零配置、离线可用的 AI 虚拟人平台。

    TypeScript 21 7

  3. ⚡ GLM Coding Rush — 智谱编程助手一键抢购脚本 | A... ⚡ GLM Coding Rush — 智谱编程助手一键抢购脚本 | Auto-Purchase Userscript for GLM Coding | 自动解锁售罄 · 高速重试 · 定时触发 · 支付保护 · 中英双语面板 | Auto-unlock sold-out · High-speed retry · Scheduled trigger · Payment guard · Bilingual panel | Tampermonkey/Violentmonkey | 点击 Raw 安装 · Click Raw to install
    1
    // ==UserScript==
    2
    // @name         GLM Coding Rush - 智谱编程助手抢购脚本
    3
    // @namespace    https://gist.github.com/LessUp
    4
    // @version      1.1.0
    5
    // @description  智谱 GLM Coding 一键抢购脚本 — 自动解锁售罄按钮 / 高速重试引擎 / bizId 双重校验 / 错误弹窗自动恢复 / 支付弹窗保护 / 秒级定时触发 / 可拖拽浮动面板
  4. micos-2024 micos-2024 Public

    End-to-end Metagenomic Intelligence and Comprehensive Omics Suite (Mammoth Cup 2024 Entry) | 端到端宏基因组综合分析平台(猛犸杯2024参赛项目)

    R 11 3

  5. cpp-high-performance-guide cpp-high-performance-guide Public

    High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization examples | 高性能 C++ 优化指南,包含无锁数据结构、SIMD 和内存优化示例

    C++ 6 1

  6. wiki-bioinfo wiki-bioinfo Public

    面向中文社区的生物信息学体系化知识库 | Systematic knowledge base for bioinformatics (Chinese)

    MDX 6 2