Analysis of the I/O performance of LHC computing jobs at the CERN computing centre

Project name

Analysis of the I/O performance of LHC computing jobs at the CERN computing centre

Project description

The LHC experiments execute a significant fraction of their data reconstruction, simulation and analysis on the CERN computing batch resources. One of the most important features of these data processing jobs is their I/O pattern in accessing the local storage system, EOS, which is based on the xrootd protocol. In fact, the way experiment applications access the data can have a considerable impact on how efficiently the computing, storage and network resources are used, and has important implications on the optimisation and size of these resources.

A promising approach is to study the logs of the storage system to identify and characterise the job I/O, which is strongly dependent on
the type of jobs (simulation, digitisation, reconstruction, etc.). A direct link between the information in the storage logs and the information in the monitoring systems of the experiments (which contain detailed information about the jobs) is possible, as it can be derived from a cross analysis of the aforementioned data sources together with information from the CERN batch systems. The goal of this project is to study such connection, use it to relate I/O storage patterns to experiment job types, look for significant variations within a given job type, identify important sources of inefficiency and describe a simple model for the computer centre (batch nodes, network, disk servers) that would increase the efficiency of the resource utilisation.

In case inefficiencies are detected that could be alleviated by changes in the way experiments run their jobs, this information should be passed to the experiments.

The analysis can be initially based on the jobs of a single large LHC experiment (ATLAS or CMS) and extended to other experiments if time allows.

Required skills

Python programming. Familiarity with data analytics techniques and tools is desirable.

Learning experience

Large scale data analytics with real world data. Python data analysis ecosystem (NumPy, pandas, SciPy, matplotlib, Jupyter). Direct
interaction with members of the LHC collaborations and an insight into
their computing systems. Complex storage systems in a large data centre environment.

Project duration

3 to 6 months

Project area

Data Analytics

Contact for further details

Andrea.Sciaba@cern.ch

Status

Submitted Submitted by sciaba on Friday, February 24, 2017 - 14:57.