Abstract
This paper introduces the Sparse Tsetlin Machine (STM), a novel Tsetlin Machine (TM) that processes sparse data efficiently. Traditionally, the TM does not consider data characteristics such as sparsity, commonly seen in NLP applications and other bag-of-word-based representations. Consequently, a TM must initialize, store, and process a significant number of zero values, resulting in excessive memory usage and computational time. Previous attempts at creating a sparse TM have predominantly been unsuccessful, primarily due to their inability to identify which literals are sufficient for TM training. By introducing Active Literals (AL), the STM can focus exclusively on literals that actively contribute to the current data representation, significantly decreasing memory footprint and computational time while demonstrating competitive classification performance.
Abstract (translated)
本文介绍了稀疏Tsetlin机器(STM),一种高效的处理稀疏数据的Tsetlin机器(TM)。传统TM不考虑诸如NLP应用和其它基于单词的表示中的稀疏数据特点。因此,TM必须初始化、存储并处理大量零值,导致过度的内存使用和计算时间。之前尝试创建稀疏TM的努力普遍都是失败的,主要原因是它们无法确定哪些符号对于TM训练是足够的。通过引入活动符号(AL),STM可以专注于对当前数据表示积极贡献的符号,从而显著减小内存足迹和计算时间,同时表现出竞争力的分类性能。
URL
https://arxiv.org/abs/2405.02375