Medical Multilingual Benchmark

About The task of MMedBench is to answer medical muti-choice questions from 6 different languages. In addition, each question has a rationale for the choice. abstracts. For more details about MMedBench, please refer to this paper: MMedBench Paper Dataset MMedBench is a medical muti-choice dataset of 6 different languages. It contains 45k samples for the trainset and 8,518 samples for the testset. Each question is company with a right answer and high quality rationale. Please visit our GitHub repository to download the dataset: Data & Code Repository Submission To submit your model, please follow the instructions in the GitHub repository. Citation If you use MMedBench in your research, please cite our paper by: @misc{qiu2024building, title={Towards Building Multilingual Language Model for Medicine}, author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie}, year={2024}, eprint={2402.13963}, archivePrefix={arXiv}, primaryClass={cs.CL} } Leaderboard Model Code Size Accuray (%) Rationale(BLEU-1) 1 Nov 28, 2023 --> GPT-4 paperlink NA 74.27 NA 2 MMed-Llama 3 paperlink 8B 67.75 47.21 3 MMedLM 2 paperlink 7B 67.30 48.81 4 Llama 3 paperlink 8B 62.79 46.76 5 Mistral paperlink 7B 60.73 45.37 6 InternLM 2 paperlink --> 7B 58.59 46.52 7 BioMistral paperlink 7B 57.45 45.93 8 Gemini-1.0 pro paperlink NA 55.20 7.28 9 MMedLM paperlink 7B 55.01 45.05 10 MEDITRON paperlink 7B 52.23 45.08 11 GPT-3.5 paperlink NA 51.82 26.01 12 InternLM paperlink 7B 45.67 42.12 13 BLOOMZ paperlink 7B 45.10 43.22 14 LLaMA 2 paperlink 7B 42.26 44.24 15 Med-Alpaca paperlink 7B 41.11 43.49 16 PMC-LLaMA paperlink 7B 40.04 43.16 17 ChatDoctor paperlink 7B 39.53 42.21 1 that BELU is evaluated on manually checked samples in the testset -->

Related Tools

Claude

Stability AI

DALL·E 3

Put AI agents to work for marketing | Jasper

Comments