Skip to content

Tags: AmesianX/TurboQuant

Tags

v1.9.0

Toggle v1.9.0's commit message
TurboQuant v1.9.0 — DeepSeek-V4-Flash FP4 on 2x DGX Spark: Tensor-Par…

…allel + Multi-Slot + MTP

v1.8.0

Toggle v1.8.0's commit message
TurboQuant v1.8.0 — DeepSeek-V4-Flash Full CUDA Port + MTP Self-Specu…

…lative Decoding

v1.7.0

Toggle v1.7.0's commit message
devops: fix pip regression on ubuntu 24.04 — drop pip self-upgrade

Previous commit's 'pip install --upgrade pip' broke cuda12.8/13.1
(ubuntu 24.04): pip can't uninstall the debian-managed pip 24.0
(RECORD file not found) and PEP 668 blocks it. Drop the pip upgrade
entirely; instead try each install with --break-system-packages and
fall back to plain pip. Works on both 22.04 (old pip, no PEP 668)
and 24.04 (new pip, PEP 668 enforced).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

v1.6.0

Toggle v1.6.0's commit message
TurboQuant v1.6.0 — Polar Derotate + Tangent Residual

v1.5.3

Toggle v1.5.3's commit message
TurboQuant v1.5.3: Double WHT per-head for D=64 + tbq4 35/35

Key changes:
- Cross-head WHT abandoned (Q-K domain mismatch at D=64)
- Double WHT per-head: S1→WHT64→S2→WHT64 (kurtosis 0.375→0.047)
- QJL re-enabled for K at D=64 (critical for multi-turn, 9+ turns)
- TBQ_TUNING: all D=64 K/V dispatch + instance combinations
- Recommended: --cache-type-k tbq4 --cache-type-v tbq3 (35/35 math)
- tbqp3 K: Korean ✅, multi-turn ✅, matrix math ❌
- math_bench.py: API key added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

v1.5.2

Toggle v1.5.2's commit message
feat: v1.5.2 — PPL 21%→8%, attention sharpening + V rotation bugfix

Attention sharpening (α = 1+1/(2×SQNR)) compensates softmax flattening
from 3-bit quantization noise. TBQP3 α=1.036, TBQ3 α=1.016. Derived
from MMSE theory, not empirically tuned.

V rotation bug: attn_rot_v was enabled but IWHT decode has no inverse
rotation — V output was corrupted. K rotation is safe (cancels in Q·K
dot product via orthogonal transform property).

Also: per-block norm for TBQ3 D=512, 1.15x V hack removed, tbq4 OOB fix.

PPL (wikitext-2, ctx=2048, Gemma 4 26B MoE):
  v1.5.1: 509.9 (1.21x vs f16)
  v1.5.2: 454.7 (1.08x vs f16)

Math bench (35 problems × 10 runs):
  tbqp3/tbq3: avg 19.1/35, peak 23/35
  f16:        avg 20.1/35, peak 21/35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

v1.5.1

Toggle v1.5.1's commit message
docs: v1.5.1 README — f16-equivalent quality with 4.2x compression

- Add v1.5.1 release notes (Korean + English)
- SWA f16 bypass, V 512-WHT, QJL D=512, attn_rot auto-management
- Final benchmark: tbqp3/tbq3 avg 37.4 > f16 avg 36.6 (10 runs each)
- TBQP auto-disables attn_rot_k (prevents triple rotation)
- Update v1.5.0 D=512 limitation note (resolved in v1.5.1)
- Add performance-critical code documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

v1.5.0

Toggle v1.5.0's commit message
docs: update Gemma 4 benchmarks with PPL, Pauli, MoE results

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

v1.4.2

Toggle v1.4.2's commit message
v1.4.2: MMA tensor core + V type fix

v1.4.1

Toggle v1.4.1's commit message
ci: add V100 (sm_70) to CUDA build architectures

Community member @nenkoru reported V100 compatibility issues.
Added compute capability 7.0 to the build matrix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>