Monitoring the WLCG infrastructure requires to gather and to analyze high volume of heterogeneous data (e.g. data transfers, job monitoring, site tests) coming from different services and experiment-specific frameworks to provide a uniform and flexible interface for scientists and sites. The current architecture, where relational database systems are used to store, to process and to serve monitoring data, has limitations in coping with the foreseen extension of the volume (e.g. higher LHC luminosity) and the variety (e.g. new data-transfer protocols and new resource-types, as cloud-computing) of WLCG monitoring events.
The goal of this project is to build a new scalable data store and analytics platform, in collaboration with the Support for Distributed Computing (SDC) group, at the CERN IT department, which leverages on a stack of technology each one targeting specific aspects on big-scale distributed data-processing (commonly referred as lambda-architecture approach).
The project can be decomposed in three main objectives and areas of work. The first objective is the batch layer, to store a constantly growing dataset providing the ability to compute arbitrary functions on it. The second objective is the serving layer, to store the batch-processed views, using indexing techniques to make them efficiently query-able. The third objective is the real-time processing layer able to perform analytics on fresh data with incremental algorithms to compensate for batch-processing latency. Moreover, the real-time analytics layer can be used as input for active-reaction, adopting classical pattern matching approach to promptly detect errors and failures on the stream of monitoring events.