Abstract
Recent studies have demonstrated that incorporating Chain-of-Thought (CoT) reasoning into the detection process can enhance a model's ability to detect synthetic images. However, excessively lengthy reasoning incurs substantial resource overhead, including token consumption and latency, which is particularly redundant when handling obviously generated forgeries. To address this issue, we propose Fake-HR1, a large-scale hybrid-reasoning model that, to the best of our knowledge, is the first to adaptively determine whether reasoning is necessary based on the characteristics of the generative detection task. To achieve this, we design a two-stage training framework: we first perform Hybrid Fine-Tuning (HFT) for cold-start initialization, followed by online reinforcement learning with Hybrid-Reasoning Grouped Policy Optimization (HGRPO) to implicitly learn when to select an appropriate reasoning mode. Experimental results show that Fake-HR1 adaptively performs reasoning across different types of queries, surpassing existing LLMs in both reasoning ability and generative detection performance, while significantly improving response efficiency.
Abstract (translated)
最近的研究表明,在检测过程中加入链式思维(Chain-of-Thought,CoT)推理可以提高模型识别合成图像的能力。然而,过长的推理过程会带来巨大的资源开销,包括令牌消耗和延迟问题,并且对于明显伪造的处理来说这是不必要的浪费。为了解决这个问题,我们提出了Fake-HR1,这是一个大规模混合推理模型,在我们的知识范围内,它是第一个能够根据生成式检测任务的特点自适应地判断是否需要进行推理的模型。为了实现这一目标,我们设计了一个两阶段训练框架:首先进行混合微调(Hybrid Fine-Tuning,HFT)以完成冷启动初始化,然后使用带有混合推理分组策略优化(Hybrid-Reasoning Grouped Policy Optimization,HGRPO)的在线强化学习来隐式地学习何时选择合适的推理模式。实验结果表明,Fake-HR1能够在不同类型的查询中自适应地执行推理,并且在推理能力和生成检测性能方面都超越了现有的大型语言模型,同时显著提高了响应效率。
URL
https://arxiv.org/abs/2602.10042