Tags · AmesianX/TurboQuant

v1.9.0

TurboQuant v1.9.0 — DeepSeek-V4-Flash FP4 on 2x DGX Spark: Tensor-Par…

…allel + Multi-Slot + MTP

Jun 22, 2026
c35c6a5
zip
tar.gz
Notes
Downloads

v1.8.0

TurboQuant v1.8.0 — DeepSeek-V4-Flash Full CUDA Port + MTP Self-Specu…

…lative Decoding

Jun 13, 2026
e2155b6
zip
tar.gz
Notes

v1.7.0

devops: fix pip regression on ubuntu 24.04 — drop pip self-upgrade

Previous commit's 'pip install --upgrade pip' broke cuda12.8/13.1
(ubuntu 24.04): pip can't uninstall the debian-managed pip 24.0
(RECORD file not found) and PEP 668 blocks it. Drop the pip upgrade
entirely; instead try each install with --break-system-packages and
fall back to plain pip. Works on both 22.04 (old pip, no PEP 668)
and 24.04 (new pip, PEP 668 enforced).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

May 28, 2026
b2bc866
zip
tar.gz
Notes
Downloads

v1.6.0

TurboQuant v1.6.0 — Polar Derotate + Tangent Residual

Apr 14, 2026
2fb6c5c
zip
tar.gz
Notes
Downloads

v1.5.3

TurboQuant v1.5.3: Double WHT per-head for D=64 + tbq4 35/35

Key changes:
- Cross-head WHT abandoned (Q-K domain mismatch at D=64)
- Double WHT per-head: S1→WHT64→S2→WHT64 (kurtosis 0.375→0.047)
- QJL re-enabled for K at D=64 (critical for multi-turn, 9+ turns)
- TBQ_TUNING: all D=64 K/V dispatch + instance combinations
- Recommended: --cache-type-k tbq4 --cache-type-v tbq3 (35/35 math)
- tbqp3 K: Korean ✅, multi-turn ✅, matrix math ❌
- math_bench.py: API key added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apr 11, 2026
6809894
zip
tar.gz
Notes

v1.5.2

feat: v1.5.2 — PPL 21%→8%, attention sharpening + V rotation bugfix

Attention sharpening (α = 1+1/(2×SQNR)) compensates softmax flattening
from 3-bit quantization noise. TBQP3 α=1.036, TBQ3 α=1.016. Derived
from MMSE theory, not empirically tuned.

V rotation bug: attn_rot_v was enabled but IWHT decode has no inverse
rotation — V output was corrupted. K rotation is safe (cancels in Q·K
dot product via orthogonal transform property).

Also: per-block norm for TBQ3 D=512, 1.15x V hack removed, tbq4 OOB fix.

PPL (wikitext-2, ctx=2048, Gemma 4 26B MoE):
  v1.5.1: 509.9 (1.21x vs f16)
  v1.5.2: 454.7 (1.08x vs f16)

Math bench (35 problems × 10 runs):
  tbqp3/tbq3: avg 19.1/35, peak 23/35
  f16:        avg 20.1/35, peak 21/35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apr 6, 2026
cf2170e
zip
tar.gz
Notes

v1.5.1

docs: v1.5.1 README — f16-equivalent quality with 4.2x compression

- Add v1.5.1 release notes (Korean + English)
- SWA f16 bypass, V 512-WHT, QJL D=512, attn_rot auto-management
- Final benchmark: tbqp3/tbq3 avg 37.4 > f16 avg 36.6 (10 runs each)
- TBQP auto-disables attn_rot_k (prevents triple rotation)
- Update v1.5.0 D=512 limitation note (resolved in v1.5.1)
- Add performance-critical code documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apr 6, 2026
a6ab2a7
zip
tar.gz
Notes

v1.5.0

docs: update Gemma 4 benchmarks with PPL, Pauli, MoE results

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apr 4, 2026
5816eb3
zip
tar.gz
Notes

v1.4.2

v1.4.2: MMA tensor core + V type fix

Apr 4, 2026
d60dbde
zip
tar.gz
Notes

v1.4.1

ci: add V100 (sm_70) to CUDA build architectures

Community member @nenkoru reported V100 compatibility issues.
Added compute capability 7.0 to the build matrix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apr 3, 2026
6a50008
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.9.0

v1.8.0

v1.7.0

v1.6.0

v1.5.3

v1.5.2

v1.5.1

v1.5.0

v1.4.2

v1.4.1

Tags: AmesianX/TurboQuant