Abstract
Vision-language models have made significant strides recently, demonstrating superior performance across a range of tasks, e.g. optical character recognition and complex diagram analysis. Building on this trend, we introduce a new vision-language model, POINTS1.5, designed to excel in various real-world applications. POINTS1.5 is an enhancement of POINTS1.0 and incorporates several key innovations: i) We replace the original CLIP vision encoder, which had a fixed image resolution, with a NaViT-style vision encoder that supports native dynamic high resolution. This allows POINTS1.5 to process images of any resolution without needing to split them into tiles. ii) We add bilingual support to POINTS1.5, significantly enhancing its capability in Chinese. Due to the scarcity of open-source Chinese datasets for vision-language models, we collect numerous images from the Internet and annotate them using a combination of manual and automatic methods. iii) We propose a set of rigorous filtering methods for visual instruction tuning datasets. We comprehensively evaluate all these filtering methods, and choose the most effective ones to obtain the final visual instruction tuning set. Thanks to these innovations, POINTS1.5 significantly outperforms POINTS1.0 and demonstrates strong performance across a range of real-world applications. Notably, POINTS1.5-7B is trained on fewer than 4 billion tokens and ranks first on the OpenCompass leaderboard among models with fewer than 10 billion parameters
Abstract (translated)
视觉语言模型最近取得了显著进展,在包括光学字符识别和复杂图表分析等一系列任务中展现了卓越的性能。基于这一趋势,我们介绍了一种新的视觉语言模型 POINTS1.5,该模型旨在在各种实际应用中表现出色。POINTS1.5 是对 POINTS1.0 的改进,并包含了几项关键创新:i) 我们用支持原生动态高分辨率的 NaViT 风格视觉编码器替换了原先固定图像分辨率的 CLIP 视觉编码器。这使得 POINTS1.5 能够处理任意分辨率的图像,而无需将它们分割成小块。ii) 我们为 POINTS1.5 添加了双语支持,显著提升了其中文能力。由于开源视觉语言模型中的中文数据集稀缺,我们从互联网上收集了许多图片,并通过结合手动和自动方法对其进行标注。iii) 我们提出了一套严格的过滤方法来处理视觉指令调优数据集。全面评估所有这些过滤方法后,选择了最有效的几种以获得最终的视觉指令调优集。得益于这些创新,POINTS1.5 显著超越了 POINTS1.0,并在一系列实际应用中表现出色。值得注意的是,在参数量少于 100 亿的模型中,训练数据少于 40 亿个 token 的 POINTS1.5-7B 在 OpenCompass 排行榜上排名第一。
URL
https://arxiv.org/abs/2412.08443