MMSciBench

📌 Overview

MMSciBench focuses on mathematics and physics that evaluates scientific reasoning capabilities. This repository contains the code for the benchmark. The dataset is available on Hugging Face: MMSciBench Dataset.

📖 Paper

If you use this benchmark in your research, please cite our paper:

@article{ye2025mmscibench,
  title={MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems},
  author={Ye, Xinwu and Li, Chengfan and Chen, Siming and Wei, Wei and Tang, Xiangru},
  journal={Findings of the Association for Computational Linguistics: ACL 2025},
  year={2025}
}

🛠 Installation

Clone the repository:

git clone https://github.com/xinwuye/MMSciBench-code.git
cd MMSciBench-code

📊 Benchmark Dataset

The dataset for MMSciBench is available on Hugging Face:

🔗 MMSciBench Dataset

🚀 Usage

Running Evaluation

To evaluate models on the benchmark, use the following command:

python exp.py
python exp_hf.py
python eval1.py

📈 Results

Once evaluation is complete, results will be saved.

📜 License

This project is licensed under the Apache-2.0 License.

🤝 Contributing

We welcome contributions! To contribute:

Fork the repository.
Create a new branch: git checkout -b feature-branch-name.
Commit your changes: git commit -m 'Add a new feature'.
Push to the branch: git push origin feature-branch-name.
Open a pull request.

🔗 Acknowledgments

We thank the open-source community and previous works that inspired this benchmark.

📬 Contact

For questions or collaborations, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
llm_apis		llm_apis
LICENSE		LICENSE
README.md		README.md
data_util.py		data_util.py
eval1.py		eval1.py
exp.py		exp.py
exp_hf.py		exp_hf.py
exp_math_concurrent_chn_hf_nonre.py		exp_math_concurrent_chn_hf_nonre.py
exp_math_concurrent_chn_nonre.py		exp_math_concurrent_chn_nonre.py
exp_math_concurrent_chn_stp_hf_nonre.py		exp_math_concurrent_chn_stp_hf_nonre.py
exp_math_concurrent_chn_stp_nonre.py		exp_math_concurrent_chn_stp_nonre.py
exp_math_concurrent_eng_stp_hf_nonre.py		exp_math_concurrent_eng_stp_hf_nonre.py
exp_math_concurrent_eng_stp_nonre.py		exp_math_concurrent_eng_stp_nonre.py
exp_physics_concurrent_chn_nonre.py		exp_physics_concurrent_chn_nonre.py
exp_physics_concurrent_chn_stp_nonre.py		exp_physics_concurrent_chn_stp_nonre.py
exp_physics_concurrent_eng_stp_nonre.py		exp_physics_concurrent_eng_stp_nonre.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMSciBench

📌 Overview

📖 Paper

🛠 Installation

📊 Benchmark Dataset

🚀 Usage

Running Evaluation

📈 Results

📜 License

🤝 Contributing

🔗 Acknowledgments

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMSciBench

📌 Overview

📖 Paper

🛠 Installation

📊 Benchmark Dataset

🚀 Usage

Running Evaluation

📈 Results

📜 License

🤝 Contributing

🔗 Acknowledgments

📬 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages