Tags: AmesianX/TurboQuant
Tags
devops: fix pip regression on ubuntu 24.04 — drop pip self-upgrade Previous commit's 'pip install --upgrade pip' broke cuda12.8/13.1 (ubuntu 24.04): pip can't uninstall the debian-managed pip 24.0 (RECORD file not found) and PEP 668 blocks it. Drop the pip upgrade entirely; instead try each install with --break-system-packages and fall back to plain pip. Works on both 22.04 (old pip, no PEP 668) and 24.04 (new pip, PEP 668 enforced). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TurboQuant v1.5.3: Double WHT per-head for D=64 + tbq4 35/35 Key changes: - Cross-head WHT abandoned (Q-K domain mismatch at D=64) - Double WHT per-head: S1→WHT64→S2→WHT64 (kurtosis 0.375→0.047) - QJL re-enabled for K at D=64 (critical for multi-turn, 9+ turns) - TBQ_TUNING: all D=64 K/V dispatch + instance combinations - Recommended: --cache-type-k tbq4 --cache-type-v tbq3 (35/35 math) - tbqp3 K: Korean ✅, multi-turn ✅, matrix math ❌ - math_bench.py: API key added Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: v1.5.2 — PPL 21%→8%, attention sharpening + V rotation bugfix Attention sharpening (α = 1+1/(2×SQNR)) compensates softmax flattening from 3-bit quantization noise. TBQP3 α=1.036, TBQ3 α=1.016. Derived from MMSE theory, not empirically tuned. V rotation bug: attn_rot_v was enabled but IWHT decode has no inverse rotation — V output was corrupted. K rotation is safe (cancels in Q·K dot product via orthogonal transform property). Also: per-block norm for TBQ3 D=512, 1.15x V hack removed, tbq4 OOB fix. PPL (wikitext-2, ctx=2048, Gemma 4 26B MoE): v1.5.1: 509.9 (1.21x vs f16) v1.5.2: 454.7 (1.08x vs f16) Math bench (35 problems × 10 runs): tbqp3/tbq3: avg 19.1/35, peak 23/35 f16: avg 20.1/35, peak 21/35 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docs: v1.5.1 README — f16-equivalent quality with 4.2x compression - Add v1.5.1 release notes (Korean + English) - SWA f16 bypass, V 512-WHT, QJL D=512, attn_rot auto-management - Final benchmark: tbqp3/tbq3 avg 37.4 > f16 avg 36.6 (10 runs each) - TBQP auto-disables attn_rot_k (prevents triple rotation) - Update v1.5.0 D=512 limitation note (resolved in v1.5.1) - Add performance-critical code documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PreviousNext