0

I'm working on implementing image segmentation using my own custom TFLite model, following the code example from MediaPipe. Here's my code:

options = vision.ImageSegmenterOptions(
    base_options=base_options,
    running_mode=mp.tasks.vision.RunningMode.IMAGE,
    output_confidence_masks=True,
    output_category_mask=False
)

mp_image = mp.Image.create_from_file(image_path)
with vision.ImageSegmenter.create_from_options(options) as segmenter:
    segmentation_result = segmenter.segment(mp_image)
    output_mask = segmentation_result.confidence_masks[0]

I've encountered two issues with the above code:

  1. The model has two outputs:

    Output 0: Name = Identity0, Shape = [1, 1], Type = numpy.float32

    Output 1: Name = Identity1, Shape = [1, x, y, z], Type = numpy.float32 (where x * y * z == image_width * image_height * image_channel=1)

    How can I retrieve both outputs instead of just one?

  2. The confidence_masks values are almost identical (min/max = 0.0701157/0.070115715), which seems unusual. The original image contains a person, and the output is correct when using my custom TFLite model with tf.lite.Interpreter.get_tensor().

I know that many frameworks support models with multiple inputs and outputs, so I'm confused about what I might be missing. Here are my specific questions:

  1. Do I need to add special metadata to the TFLite model file?
  2. How should I modify the original MediaPipe code to handle multiple outputs?

1 Answer 1

0

Why do you have output_category_mask=False and are expecting 2 outputs ? You are specifically asking the model to only return 1 output.

Please check the documentation and source code.

output_confidence_masks: Whether to output confidence.

output_category_mask: Whether to output category mask.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your reply. I've tried setting both to true, but the issue persists. Actually, the confidence_masks and category_mask are two different representations of the same mask, uint8/fp32. What I really need are two distinct outputs with different shapes. This is a different scenario.
Well, since you are getting a list of binary masks as output, and you want an output of shape (1, C, H, W), you could stack the channels along the channel dimension and add batch dimension at the end.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.