Abstract
This paper outlines our submission to the MEDIQA2024 Multilingual and Multimodal Medical Answer Generation (M3G) shared task. We report results for two standalone solutions under the English category of the task, the first involving two consecutive API calls to the Claude 3 Opus API and the second involving training an image-disease label joint embedding in the style of CLIP for image classification. These two solutions scored 1st and 2nd place respectively on the competition leaderboard, substantially outperforming the next best solution. Additionally, we discuss insights gained from post-competition experiments. While the performance of these two solutions have significant room for improvement due to the difficulty of the shared task and the challenging nature of medical visual question answering in general, we identify the multi-stage LLM approach and the CLIP image classification approach as promising avenues for further investigation.
Abstract (translated)
本文概述了我们向MEDIQA2024多语言多模态医疗答案生成(M3G)共享任务提交的论文。我们在任务的英语类别下报告了两个独立解决方案的结果,其中第一个涉及两次连续的API调用到Claude 3 Opus API,第二个涉及以CLIP风格训练图像疾病标签联合嵌入进行图像分类。这两个解决方案在竞赛排行榜上分别获得第一和第二名,远远超过了下一个最好的解决方案。此外,我们讨论了从比赛实验中获得的见解。虽然这两个解决方案由于共享任务的难度和医疗视觉问题回答的挑战性而性能还有很大的提升空间,但我们认为多级LLM方法和CLIP图像分类方法是进一步研究的有前途的途径。
URL
https://arxiv.org/abs/2404.14567