How to integrate a lightweight image-to-text model into a React Native app?

Question

I am trying to integrate an image-to-text model into a React Native mobile app.

My requirements: The model should support image + text input → text output. It should be lightweight enough to run on mobile devices.

What I tried

moondream-0.5b (ONNX conversion)

Tried converting it to ONNX.
Faced issues with the tokenizer for encoding/decoding.
The output tokens were irrelevant.

microsoft/florence-base (fine-tuned, .pt format)

Fine-tuned it for my use case.
Converted to .pt.
Integration failed due to an error: “corrupted PyTorch model”.

desertnaut · Accepted Answer · 2025-09-23 21:03:32Z

It seems like you're looking for a "one-in-all" answer. Maybe reworking/redoing one of your initial attempts may get you the answer, but I'm a fan of breaking things up. Personally, I have a work requirement related to expense tracking, so I've been researching OCR for mobile and found:

https://github.com/a7medev/react-native-ml-kit or the NPM link

With the extracted text, you could easily run a cheap/free server (AWS free-tier, Google Cloud free-tier, Heroku cheap) with a mini LLM and pass the extracted text and a text prompt to a server to get the heavy load off the user's mobile device.

Consider whether you truly want everything on a mobile device.

Even after quantizing a model, you'll still be looking at about 50-100MB of size alone (just for the model) which is a pretty large app. I believe Android's Google store has a limit of 150MB and then you have to do some funky file splitting (I think).

Collectives™ on Stack Overflow

How to integrate a lightweight image-to-text model into a React Native app?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related