mediapipe Image Segmentation with Dual-Output TFLite Model

Question

I'm working on implementing image segmentation using my own custom TFLite model, following the code example from MediaPipe. Here's my code:

options = vision.ImageSegmenterOptions(
    base_options=base_options,
    running_mode=mp.tasks.vision.RunningMode.IMAGE,
    output_confidence_masks=True,
    output_category_mask=False
)

mp_image = mp.Image.create_from_file(image_path)
with vision.ImageSegmenter.create_from_options(options) as segmenter:
    segmentation_result = segmenter.segment(mp_image)
    output_mask = segmentation_result.confidence_masks[0]

I've encountered two issues with the above code:

The model has two outputs:

Output 0: Name = Identity0, Shape = [1, 1], Type = numpy.float32

Output 1: Name = Identity1, Shape = [1, x, y, z], Type = numpy.float32 (where x * y * z == image_width * image_height * image_channel=1)

How can I retrieve both outputs instead of just one?
The confidence_masks values are almost identical (min/max = 0.0701157/0.070115715), which seems unusual. The original image contains a person, and the output is correct when using my custom TFLite model with tf.lite.Interpreter.get_tensor().

I know that many frameworks support models with multiple inputs and outputs, so I'm confused about what I might be missing. Here are my specific questions:

Do I need to add special metadata to the TFLite model file?
How should I modify the original MediaPipe code to handle multiple outputs?

anxiousPI · Accepted Answer · 2025-04-03 09:29:32Z

0

Why do you have output_category_mask=False and are expecting 2 outputs ? You are specifically asking the model to only return 1 output.

Please check the documentation and source code.

output_confidence_masks: Whether to output confidence.

output_category_mask: Whether to output category mask.

answered Apr 3, 2025 at 9:29

anxiousPI

2033 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

lcljesse Over a year ago

Thanks for your reply. I've tried setting both to true, but the issue persists. Actually, the confidence_masks and category_mask are two different representations of the same mask, uint8/fp32. What I really need are two distinct outputs with different shapes. This is a different scenario.

anxiousPI Over a year ago

Well, since you are getting a list of binary masks as output, and you want an output of shape (1, C, H, W), you could stack the channels along the channel dimension and add batch dimension at the end.

Collectives™ on Stack Overflow

mediapipe Image Segmentation with Dual-Output TFLite Model

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related