Integration of network and transfer metrics to optimize experiments workflows

One of the common use cases reported by the experiments is enabling network-aware tools, this is mainly driven by the need to optimize transfers and/or experiment workflows. This involves providing a uniform way to access and integrate existing measurements and the ability to define a so called “distance” metric between storage elements (and/or sites) that would integrate a range of different metrics such as link status, utilization, functional tests, occupancy, etc. and provide a cost matrix that can be used to decide on the job placement, finding closest replicas, determine closest storage where data can be uploaded, etc.

The aim of this project is to contribute to the ongoing developments in this area and develop a set of libraries and components that would compute the cost matrix using different algorithms and based on different set of network inputs.


Machine Learning algorithms, ElasticSearch, Spark/Hadoop, ML in Spark

The student will acquire practical experience in data aggregation and time series predictions and will get hands-on experience with very rich datasets such as network latencies, paths and throughput

12 months

Data Analytics
Monitoring of the distributed infrastructure

Hendrik Borras
University of Heidelberg
Marian Babik
18 Sep 2016
not scheduled yet

