477 questions
-1
votes
0
answers
20
views
YOLOv8s TensorRT INT8 engine produces wrong bounding boxes with saturated confidence scores on Jetson Orin
I'm trying to quantize a YOLOv8s model to INT8 using TensorRT on a Jetson Orin (JetPack, TensorRT 8.6.2, Ultralytics 8.2.83, CUDA 12.2). The FP16 engine works correctly but the INT8 engine produces ...
3
votes
0
answers
32
views
How to convert the MLP in MoE to 4 bit quantization?
I'm doing some research about the information encoding with LLMs and need to find a way to quantize the weights of the MLP layers(MoE) to 4 bits and even customized mixed precision. Consider
from ...
0
votes
0
answers
31
views
post training quantized model gets the error "Copying from quantized Tensor to non-quantized Tensor is not allowed" even though I'm not copying tensor
I got a pretrained resnet 18 model from this lane detection repo in order to use it as an ADAS(advanced driver assistance systems) function for an electric car making competition. My current goal is ...
0
votes
1
answer
124
views
Apply Quantization on a CNN
I want to apply a quantization function to a deep CNN. This CNN is used for an image classification(in 4 classes) task, and my data consists of 224×224 images. When I run this code, I get an error. ...
2
votes
0
answers
99
views
Issue Replicating TF-Lite Conv2D Quantized Inference Output
I am trying to reproduce the exact layer-wise output of a quantized EfficientNet model (TFLite model, TensorFlow 2.17) by re-implementing Conv2D, DepthwiseConv2D, FullyConnected, Add, Mul, Sub and ...
0
votes
2
answers
237
views
Why does TFLite INT8 quantization decompose BatchMatMul (from Einsum) into many FullyConnected layers?
I’m debugging a model conversion using onnx2tf and post-training quantization issue involving Einsum, BatchMatMul, and FullyConnected layers across different model formats.
Pipeline:
ONNX → TF ...
0
votes
0
answers
58
views
Error while converting quantized Torch model to ONNX
I’m applying QAT to YOLOv8n model with the following configuration:
QConfig(
activation=FakeQuantize.with_args(
observer=MovingAverageMinMaxObserver,
quant_min=0,
quant_max=...
1
vote
0
answers
42
views
Quantization In Tensorflow2, Instance error
I am trying to quantize a model in tensorflow using tfmot.
This is a sample model,
inputs = keras.layers.Input(shape=(512, 512, 1))
x = keras.layers.Conv2D(3, kernel_size=1, padding='same')(inputs)
x =...
0
votes
1
answer
313
views
RuntimeError: CUDA error: named symbol not found when using TorchAoConfig with Qwen2.5-VL-7B-Instruct model
I'm trying to load the Qwen2.5-VL-7B-Instruct model from hugging face with 4-bit weight-only quantization using TorchAoConfig (similar to how its mentioned in the documentation here), but I'm getting ...
1
vote
0
answers
152
views
Fine-tuned LLaMA 2–7B with QLoRA, but reloading fails: missing 4bit metadata. Likely saved after LoRA+resize. Need proper 4bit save method
I’ve been working on fine-tuning LLaMA 2–7B using QLoRA with bitsandbytes 4-bit quantization and ran into a weird issue. I did adaptive pretraining on Arabic data with a custom tokenizer (vocab size ~...
0
votes
2
answers
69
views
Straight-Through estimation for vector quantization inside a recurrent neural network
in my model, I use vector quantization (VQ) inside a recurrent neural network. The VQ is trained using straight-through estimation with that particular code being identical to [1]:
...
0
votes
0
answers
246
views
Cannot use bitsandbytes for quantization of LLM
I am using LLM, and I want to use quantization to boost the inference process. I am using the Nvidia Jetson AGX Orin GPU, which is an ARM-based architecture. I use this code
model_name = "tiiuae/...
0
votes
0
answers
41
views
Mismatch between PyTorch inference and manual implementation
I’m trying to manually reproduce the inference forward-pass to understand exactly how quantized inference works. To do so, I trained and quantized a model in PyTorch using QAT, manually simulate the ...
1
vote
0
answers
109
views
how to convert a QAT quantization aware trained tensorflow graph into tflite model?
I have am quantizing a neural network using QAT and I want to convert it into tflite.
Quantization nodes get added to the skeleton graph and we get a new graph.
I am able to load the trained QAT ...
0
votes
0
answers
47
views
Stable Diffusion v1.4 PTQ on both weight and activation
I'm currently working on quantizing the Stable Diffusion v1.4 checkpoint without relying on external libraries such as torch.quantization or other quantization toolkits. I’m exploring two scenarios:
...