0

I have trained a yolov8 model in order to detect handwritten digits on paper with MNIST dataset. The problem is that the images on the dataset are 416x416 and once the model is trained it gets really good metrics on everything. However when I tray to inference my model on an image (which is not 416x416, for example, a white paper with handwritten digits) it is not detecting anything. Does this mean that in YOLO models, when we train to detect something in images, it must then be inferred with that model in images of the same size? This would not make much sense, since for example yolov8 is able to detect dogs, and I am sure that if I introduce an image with a different size containing a dog, it will detect it.

The images I have used for training are like this one: enter image description here

And the inference image I tried to was: enter image description here

2
  • I'm afraid the problem is not about image resolution. If you train the model to detect a single digit from an input image, it may not be able to correctly process a sample with many digits and non-digits on it. The model needs to see similar multi-character examples during the training. Commented Aug 2, 2024 at 14:07
  • As for the image resolution, you can run inference on different resolutions, it makes sense to specify it in the imgsz=864 parameter during prediction (sample image width in this case, or similar to it) Commented Aug 2, 2024 at 14:12

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.