A lightweight PyTorch library for efficient model quantization and memory optimization. Perfect for running large language models on consumer hardware.
- π― 8-bit & 4-bit quantization primitives
- πΎ Memory-efficient optimizers
- π LLM.int8() inference support
- π QLoRA-style fine-tuning
- π₯οΈ Cross-platform hardware support
import torch
from Quanta.functional.quantization import quantize_8bit, dequantize_8bit
# Quantize your model
q_tensor, scale, zero_point = quantize_8bit(model_weights)π§ Early Development - Currently implementing core quantization features.
MIT License
Inspired by bitsandbytes