Abstract
As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.
Abstract (translated)
随着AI系统的日益强大,对安全AI的需求变得更加紧迫。人类是AI安全性的一个有吸引力的模型:作为唯一已知具备一般智能的主体,人类即使在条件显著偏离先前经验的情况下也能稳健地表现、安全地探索世界、理解语用学,并能合作以实现其内在目标。当智能与合作和安全保障机制结合时,它可以推动持续的进步和福祉。这些特性是大脑结构及其实施的学习算法的功能。因此,神经科学可能握有目前尚未充分探索和利用的技术AI安全的关键。 在这份路线图中,我们强调并批判性地评估了几条受神经科学启发的通往AI安全的道路:模拟大脑的表示、信息处理和架构;通过模仿脑数据和身体来构建稳健的感觉运动系统;在脑数据上微调AI系统;使用神经科学方法推进可解释性;以及扩展认知启发的架构。我们提出了几项具体的建议,说明如何让神经科学对AI安全产生积极影响。
URL
https://arxiv.org/abs/2411.18526