The LHC experiments execute a significant fraction of their data reconstruction, simulation and analysis on the CERN computing batch resources. One of the most important features of these data processing jobs is their I/O pattern in accessing the local storage system, EOS, which is based on the xrootd protocol. In fact, the way experiment applications access the data can have a considerable impact on how efficiently the computing, storage and network resources are used, and has important implications on the optimisation and size of these resources.
A promising approach is to study the logs of the storage system to identify and characterise the job I/O, which is strongly dependent on
the type of jobs (simulation, digitisation, reconstruction, etc.). A direct link between the information in the storage logs and the information in the monitoring systems of the experiments (which contain detailed information about the jobs) is possible, as it can be derived from a cross analysis of the aforementioned data sources together with information from the CERN batch systems. The goal of this project is to study such connection, use it to relate I/O storage patterns to experiment job types, look for significant variations within a given job type, identify important sources of inefficiency and describe a simple model for the computer centre (batch nodes, network, disk servers) that would increase the efficiency of the resource utilisation.
In case inefficiencies are detected that could be alleviated by changes in the way experiments run their jobs, this information should be passed to the experiments.
The analysis can be initially based on the jobs of a single large LHC experiment (ATLAS or CMS) and extended to other experiments if time allows.