Abstract
Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different points of view. Rule-based languages like Answer Set Programming (ASP) seem well suited for specifying user-provided criteria to assess pattern utility in a form of constraints; moreover, declarativity of ASP allows for a very easy switch between several criteria in order to analyze the dataset from different points of view. In this paper, we make steps toward extending the notion of High Utility Pattern Mining (HUPM); in particular we introduce a new framework that allows for new classes of utility criteria not considered in the previous literature. We also show how recent extensions of ASP with external functions can support a fast and effective encoding and testing of the new framework. To demonstrate the potential of the proposed framework, we exploit it as a building block for the definition of an innovative method for predicting ICU admission for COVID-19 patients. Finally, an extensive experimental activity demonstrates both from a quantitative and a qualitative point of view the effectiveness of the proposed approach. Under consideration in Theory and Practice of Logic Programming (TPLP)
Abstract (translated)
在数据挖掘中,检测给定数据集中的相关模式是一项重要的挑战。模式 relevance 也称为文献中的 utility,是一种主观测量,可以从非常不同的角度进行评估。规则语言如 Answer Set Programming (ASP)似乎非常适合指定用户提供的标准以评估模式 utility 的形式进行约束;此外,ASP 的 declarativity 允许非常轻松地切换多个标准以分析数据集从不同的角度。在本文中,我们朝着扩展高 Utility 模式挖掘 (HUPM)的概念迈出一步;特别是,我们引入了一个新的框架,该框架允许新的 utility 标准class,在先前的文献中未考虑。我们还展示了如何使用最近的 ASP 扩展外部函数支持快速且有效的编码和测试新框架。为了展示新框架的潜力,我们利用它作为构建块,定义一种预测 COVID-19 患者重症监护病房接纳方法的创新性方法。最后,一项广泛的实验活动从定量和定性角度证明了所提出的方法的有效性。在逻辑编程的理论和实践中,正在考虑。
URL
https://arxiv.org/abs/2303.13191