489 questions
0
votes
0
answers
32
views
how to convert a QAT quantization aware trained tensorflow graph into tflite model?
I have am quantizing a neural network using QAT and I want to convert it into tflite.
Quantization nodes get added to the skeleton graph and we get a new graph.
I am able to load the trained QAT ...
0
votes
0
answers
18
views
Stable Diffusion v1.4 PTQ on both weight and activation
I'm currently working on quantizing the Stable Diffusion v1.4 checkpoint without relying on external libraries such as torch.quantization or other quantization toolkits. I’m exploring two scenarios:
...
0
votes
0
answers
58
views
Error about bitsandbytes from Huggingface
For the below code, which is a standard snippet from Huggingface website, I'm getting the error:
ImportError: Using `bitsandbytes` 4-bit quantization requires the latest version
of bitsandbytes: `pip ...
0
votes
0
answers
18
views
sub-4 bit quantized model on nvidia gpu
I was trying to run deepseek-r1-distill-llama70b-bf16.gguf (131gb on disk) on two A6000 gpus (48gb vram each) with llama.cpp. It runs with partial gpu offload but the gpu utilization is at 9-10% and ...
0
votes
0
answers
66
views
How do I resolve ImportError Using bitsandbytes 4bit quantization requires the latest version of bitsandbytes despite having version 0.45.3 installed?
I am trying to use the bitsandbytes library for 4-bit quantization in my model loading function, but I keep encountering an ImportError. The error message says, "Using bitsandbytes 4-bit ...
0
votes
0
answers
27
views
Onnxruntime quantization script for MatMulNbits, what is the type after conversion?
In the onnxruntime documentation, for quantization here:
https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantize-to-int4uint4
It sets accuracy_level=4 which means it's a ...
1
vote
0
answers
39
views
Issues with MP3-like Compression: Quantization and File Size
I’m trying to implement an MP3-like compression algorithm for audio and have followed the general steps, but I’m encountering a few issues with the quantization step. Here's the overall process I'm ...
0
votes
0
answers
17
views
Kernel Dies When Testing a Quantized ResNet101 Model in PyTorch
Issue: I am encountering a kernel dies problem specifically during inference when using a quantized ResNet101 model in PyTorch. The model trains and quantized successfully, but the kernel dies when ...
0
votes
1
answer
223
views
Trying to quantize YOLOv11 in tensorflow, is this topology normal?
I'm trying to quantize the YOLO v11 model in tensorflow and get this as a result:
The target should be int8. Is this normal behaviour? When running it with tflite micro on an esp32 I quicly run out of ...
0
votes
0
answers
21
views
Unit testing PNG quantization by Sharp in Jest
I am fetching PNG files from a 3rd party API endpoint and quantizing them using Sharp before sending the response to the client. How can I unit test the quantization process?
My intention was to have ...
0
votes
0
answers
160
views
Structured Pruning of Yolov8
I have to run the object detector on Raspberry Pi 4b for real-time object detection. For this task, I have decided to use yolov8n. I have to run the detector in real-time, and since I don't have any ...
0
votes
0
answers
34
views
Reference code to convert microsoft/Multilingual-MiniLM-L12-H384 to int8 quantization
I am trying to convert microsoft/Multilingual-MiniLM-L12-H384 in to int8 with post quantization approach. Can someone point out references.
I have referred to tensorflow blogs and converter APIs and ...
0
votes
0
answers
22
views
While trying to implement QLORA using trainer class, getting casting error
lora_config=LoraConfig(
r=8,
lora_alpha=32,
target_modules=['q_lin','v_lin'],
lora_dropout=0.1,
bias='all'
)
class distilbertMultiClass(nn.Module):
def __init__(self,model,...
1
vote
0
answers
69
views
Transforming a picture into a posterized image with matching grid overlay and symbols
First of all, I want to help my mom with her embroidery projects and secondly, I want to get better in Python. So I don't need an exact solution. But it would be great to be pointed in the right ...
3
votes
1
answer
496
views
RuntimeError: "Unused kwargs" and "frozenset object has no attribute discard" with BitsAndBytes bf16 Quantized Model in Hugging Face Gradio App
I'm encountering a RuntimeError while running a BitsAndBytes bf16 quantized Gemma-2-2b model on Hugging Face Spaces with a Gradio UI. The error specifically mentions unused kwargs and an ...