- ForeAct is a visual foresight planner that empowers VLAs with the ability to anticipate future observations, enabling more informed decision-making.
- ForeAct is general and plug-and-play: state-of-the-art VLAs can seamlessly incorporate ForeAct without any architectural modification.
- ForeAct is highly efficient, generating a high-fidelity 640
$\times$ 480 future observation in just 0.33s on a single H100 GPU.
git clone https://github.com/mit-han-lab/foreact
cd foreact
bash environment_setup.sh foreactDownload the pretrained weights and prepare your own real-world data (or use our processed real-world data). Update the relevant paths in configs/finetune.yaml, then launch:
bash scripts/run_finetune.sh### CLI
python app_cli.py --checkpoint_path path/to/model --prompt "" --input_image path/to/image --output_dir ./results
### Gradio
python app.py --checkpoint_path path/to/modelWe provide examples regarding policy training in ./third-party/lerobot.
Thanks to metaquery, diffusers, lerobot for the wonderful open-source codebase.

