0

I am building a Flutter app where I have to execute three separate TensorFlow Lite models on-device:

  1. An embedding model
  2. An action video detection model
  3. A DistilGPT2 RAG model

Currently, I bundle all .tflite models inside the assets/ folder and load them using tflite_flutter.

As the three models are all packed within the app, the APK/IPA size has turned very bulky and the performance is also being impacted.

What I've tried so far:

  • Used quantization (int8, float16) to lower model size.
  • Loaded the models with tflite_flutter in separate isolates.

However, the app size is huge, and executing the models (particularly the video detection and GPT2) is resulting in lag.

My questions:

  1. What are the best practices for running multiple TFLite models in a Flutter app without making the app too heavy?
  2. For video models and a language model such as DistilGPT2, how do I best optimize performance on-device?

Environment:

  • Flutter 3.x
  • TensorFlow Lite
  • Target: Android

Any advice, optimization suggestions, or example strategies would be highly appreciated.

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.