0

I have a hard time to convert a given tensorflow model into a tflite model and then use it. I already posted a question where I described my problem but didn't share the model I was working with, because I am not allowed to. Since I didn't find an answer this way, I tried to convert a public model (ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu).

Here is a colab tutorial from the object detection api. I just run the whole script without changes (its the same model) and downloaded the generated models (with and without metadata). I uploaded them here together with a sample picture from the coco17 train dataset.

I tried to use those models directly in python, but the results feel like garbage.

Here is the code I used, I followed this guide. I changed the indexes for rects, scores and classes because otherwise the results were not in the right format.

#interpreter = tf.lite.Interpreter("original_models/model.tflite")
interpreter = tf.lite.Interpreter("original_models/model_with_metadata.tflite")

interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

size = 640

def draw_rect(image, box):
    y_min = int(max(1, (box[0] * size)))
    x_min = int(max(1, (box[1] * size)))
    y_max = int(min(size, (box[2] * size)))
    x_max = int(min(size, (box[3] * size)))
    
    # draw a rectangle on the image
    cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (255, 255, 255), 2)

file = "images/000000000034.jpg"


img = cv2.imread(file)
new_img = cv2.resize(img, (size, size))
new_img = cv2.cvtColor(new_img, cv2.COLOR_BGR2RGB)

interpreter.set_tensor(input_details[0]['index'], [new_img.astype("f")])

interpreter.invoke()
rects = interpreter.get_tensor(
    output_details[1]['index'])

scores = interpreter.get_tensor(
    output_details[0]['index'])

classes = interpreter.get_tensor(
    output_details[3]['index'])


for index, score in enumerate(scores[0]):
        draw_rect(new_img,rects[0][index])
        #print(rects[0][index])
        print("scores: ",scores[0][index])
        print("class id: ", classes[0][index])
        print("______________________________")


cv2.imshow("image", new_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

This leads to the following console output

scores:  0.20041436
class id:  51.0
______________________________
scores:  0.08925027
class id:  34.0
______________________________
scores:  0.079722285
class id:  34.0
______________________________
scores:  0.06676647
class id:  71.0
______________________________
scores:  0.06626186
class id:  15.0
______________________________
scores:  0.059938848
class id:  86.0
______________________________
scores:  0.058229476
class id:  34.0
______________________________
scores:  0.053791136
class id:  37.0
______________________________
scores:  0.053478718
class id:  15.0
______________________________
scores:  0.052847564
class id:  43.0
______________________________

and the resulting image

model output.

I tried different images from the orinal training dataset and never got good results. I think the output layer is broken or maybe some postprocessing is missing?

I also tried to use the converting method given from the offical tensorflow documentaion.

import tensorflow as tf

saved_model_dir = 'tf_models/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/'
    # Convert the model
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # path to the SavedModel directory
tflite_model = converter.convert()
    
# Save the model.
with open('model.tflite', 'wb') as f:
      f.write(tflite_model)

But when I try to use the model, I get a ValueError: Cannot set tensor: Dimension mismatch. Got 640 but expected 1 for dimension 1 of input 0.

Has anyone an idea what I am doing wrong?

Update: After Farmmakers advice, I tried changing the input dimensions of the model generating by the short script at the end. The shape before was:

[{'name': 'serving_default_input_tensor:0',
  'index': 0,
  'shape': array([1, 1, 1, 3], dtype=int32),
  'shape_signature': array([ 1, -1, -1,  3], dtype=int32),
  'dtype': numpy.uint8,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

So adding one dimension would not be enough. Therefore I used interpreter.resize_tensor_input(0, [1,640,640,3]) . Now it works to feed an image through the net.

Unfortunately I sill can't make any sense of the output. Here is the print of the output details:

[{'name': 'StatefulPartitionedCall:6',
  'index': 473,
  'shape': array([    1, 51150,     4], dtype=int32),
  'shape_signature': array([    1, 51150,     4], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:0',
  'index': 2233,
  'shape': array([1, 1], dtype=int32),
  'shape_signature': array([ 1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:5',
  'index': 2198,
  'shape': array([1], dtype=int32),
  'shape_signature': array([1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:7',
  'index': 493,
  'shape': array([    1, 51150,    91], dtype=int32),
  'shape_signature': array([    1, 51150,    91], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:1',
  'index': 2286,
  'shape': array([1, 1, 1], dtype=int32),
  'shape_signature': array([ 1, -1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:2',
  'index': 2268,
  'shape': array([1, 1], dtype=int32),
  'shape_signature': array([ 1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:4',
  'index': 2215,
  'shape': array([1, 1], dtype=int32),
  'shape_signature': array([ 1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:3',
  'index': 2251,
  'shape': array([1, 1, 1], dtype=int32),
  'shape_signature': array([ 1, -1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]  

I added the so generated tflite model to the google drive.

Update2: I added a directory to the google drive which contains a notebook that uses the full size model and produces the correct output. If you execute the whole notebook it should produce the following image to your disk.

enter image description here

12
  • The last one seems like a batch size dimension. You can use tensorflow.org/api_docs/python/tf/expand_dims . So basically you have something like [640,640,3] and you have to do like [1,640,640,3]
    – Farmaker
    Commented Sep 25, 2021 at 4:57
  • Thank you for your help again! Unfortunately it is still not really working.. I updated my question and added the tflite model to the google drive link. Maybe you have another idea?
    – Burschken
    Commented Sep 25, 2021 at 9:40
  • 1
    From the colab you uploaded I can build and verify that the model creates bounding boxes correctly. The problem is that if you see inside the detect_fn function you have to do a preprocess, predict and post process. These steps have to be done with interpreter also. Check to find out where and what are these steps inside the Object API. From my experience you will have a really hard time. You have to combine TensorFlow model and Interpreter with the steps...or you have to change to an easier API for TFLite.
    – Farmaker
    Commented Sep 28, 2021 at 9:48
  • 1
    Usually at the master branch there is a colab notebook or a .py file with end to end inference as an example. I do not see something like that.
    – Farmaker
    Commented Sep 28, 2021 at 10:02
  • 1
    Take a look at this github.com/tensorflow/models/tree/master/research/… if any of the examples suits you..ping me again.
    – Farmaker
    Commented Sep 28, 2021 at 10:05

2 Answers 2

1

For the models from Object Detection APIs to work well with TFLite, you have to convert it to TFLite-friendly graph that has custom op.

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md

(TF1 doc)

You can also try using TensorFlow Lite Model Maker

1
  • Hi, I used the first link to create the tflite model. I will check out the other links tomorrow. Thanks in advance!
    – Burschken
    Commented Sep 27, 2021 at 15:08
1

I followed the exact procedure you show (standard procedure mentioned in tensorflow doc).

First the output returned by the tflite model, unlike described in the official documentation, has a different format (different indexing).

  boxes = get_output_tensor(interpreter, 1)
  classes = get_output_tensor(interpreter, 3)
  scores = get_output_tensor(interpreter, 0)
  count = int(get_output_tensor(interpreter, 2))

Second, the number of retuned bounding boxes are always 10, and I can't figure out how to change that to the custom number of objects in my dataset.

Finally, the way I solved it is just by retrieving the bounding boxes using index 1, and filtering them out using the scores. However, the results I get are far from the original model's. Moreover, the tflite model takes way more time than the original model, opposite of what tflite is meant for. Probably, because I run it on my laptop, therefore x86 instruction set (tflite is optimize to run on ARM CPUs instead (mobile, raspberry pi)).

3
  • I'd consider removing the last sentence. Answers should provide answers, not ask follow-up questions or request additional information. This still provides your findings and your solution, and I know you're just acknowledging the limits of your understanding in soliciting further clarification from the community. But that might be mistaken as a this not being an answer. Commented Sep 30, 2021 at 1:57
  • There is an issue with TensorFlow version 2.6.0 and order of outputs of TFLite. Consider rollback to version 2.5.0 to make conversion to tflite and use it.
    – Farmaker
    Commented Sep 30, 2021 at 9:28
  • Thank you a LOT!, indeed that solved the indexing issue. However, I am still trying to figure out why TFLite returns always 10 objects when instead, my images can contain maximum two objects and therefore two classes. In the returned 10 bounding boxes I can see something similar to the original model's output however the scores are kind of random, therefore I am not able to properly filter them out.
    – cricket
    Commented Sep 30, 2021 at 11:10

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.