Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Abstract
Abstract (translated)
URL
PDF

Abstract

Featurizing microscopy images for use in biological research remains a significant challenge, especially for large-scale experiments spanning millions of images. This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs) when training with increasingly larger model backbones and microscopy datasets. Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases. Additionally, we develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time. We demonstrate that CA-MAEs effectively generalize by inferring and evaluating on a microscopy image dataset (JUMP-CP) generated under different experimental conditions with a different channel structure than our pretraining data (RPI-93M). Our findings motivate continued research into scaling self-supervised learning on microscopy data in order to create powerful foundation models of cellular biology that have the potential to catalyze advancements in drug discovery and beyond.

Abstract (translated)

将显微图像特征化用于生物研究仍然是一个重要的挑战，尤其是在跨越数百万张图像的大型实验中。本文探讨了在训练过程中使用越来越大模型骨干和显微数据集时，弱监督分类器和自监督掩码自动编码器（MAEs）的扩展性质。我们的结果表明，基于ViT的MAEs在各种任务上优于弱监督分类器，在回忆来自公共数据库中预先整理的已知生物学关系时，相对改进多达11.5%。此外，我们开发了一种新的通道无关MAE架构（CA-MAE），允许在推理时输入不同数量和维度的图像。我们证明了CA-MAEs通过推断和评估来有效泛化，与我们的预训练数据（RPI-93M）生成具有不同通道结构的显微图像数据集（JUMP-CP）相比。我们的研究结果激励继续研究在显微数据上进行自监督学习，以创建有潜力的细胞生物学基础模型，该模型可以促进药物发现及其他领域的进步。

URL

https://arxiv.org/abs/2404.10242

PDF

https://arxiv.org/pdf/2404.10242.pdf

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Abstract

Abstract (translated)

URL

PDF Copy

PDF