Paper Reading AI Learner

Ontology-based Design of Experiments on Big Data Solutions

2019-04-18 07:52:54
Maximilian Zocholl, Elena Camossi, Anne-Laure Jousselme, Cyril Ray

Abstract

Big data solutions are designed to cope with data of huge Volume and wide Variety, that need to be ingested at high Velocity and have potential Veracity issues, challenging characteristics that are usually referred to as the "4Vs of Big Data". In order to evaluate possibly complex big data solutions, stress tests require to assess a large number of combinations of sub-components jointly with the possible big data variations. A formalization of the Design of Experiments (DoE) on big data solutions is aimed at ensuring the reproducibility of the experiments, facilitating their partitioning in sub-experiments and guaranteeing the consistency of their outcomes in a global assessment. In this paper, an ontology-based approach is proposed to support the evaluation of a big data system in two ways. Firstly, the approach formalizes a decomposition and recombination of the big data solution, allowing for the aggregation of component evaluation results at inter-component level. Secondly, existing work on DoE is translated into an ontology for supporting the selection of experiments. The proposed ontology-based approach offers the possibility to combine knowledge from the evaluation domain and the application domain. It exploits domain and inter-domain specific restrictions on the factor combinations in order to reduce the number of experiments. Contrary to existing approaches, the proposed use of ontologies is not limited to the assertional description and exploitation of past experiments but offers richer terminological descriptions for the development of a DoE from scratch. As an application example, a maritime big data solution to the problem of detecting and predicting vessel suspicious behaviour through mobility analysis is selected. The article is concluded with a sketch of future works.

Abstract (translated)

大数据解决方案的设计是为了应对海量、种类繁多的数据,这些数据需要以高速吞食,并且具有潜在的准确性问题,具有挑战性的特点,通常被称为“大数据的4V”。为了评估可能复杂的大数据解决方案,压力测试需要评估大量的子组件组合以及可能的大数据变化。大数据解决方案实验设计(DOE)的形式化旨在确保实验的再现性,促进子实验的划分,并确保其结果在全球评估中的一致性。本文提出了一种基于本体的方法,以两种方式支持大数据系统的评估。首先,该方法将大数据解决方案的分解和重组形式化,允许在组件间级别聚合组件评估结果。其次,将现有的DOE工作转化为支持实验选择的本体论。这种基于本体的方法提供了将评估领域和应用领域的知识相结合的可能性。它利用域和域间对因子组合的特定限制来减少实验次数。与现有的方法相反,本体论的提议使用不仅限于对过去实验的断言描述和利用,而且为从头开始开发DOE提供了更丰富的术语描述。作为一个应用实例,选择了一种通过机动性分析来检测和预测船舶可疑行为的海上大数据解决方案。文章最后是对未来作品的概述。

URL

https://arxiv.org/abs/1904.08626

PDF

https://arxiv.org/pdf/1904.08626.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot