Advanced Notifications for WAN Incidents

Project name

Advanced Notifications for Network Incidents (ANNI)

Project description

One of the main challenges in WLCG WAN networking is the network diagnostics and advanced notifications on the issues seen in the network. LHCOPN/LHCONE as the core global networks in WLCG have more than 5000 active links between 120 sites. Currently, most of the issues are only visible by the applications and need to be debugged after the incident and performance degradation has already occurred. This is primarily due to the underlying complexity of the WLCG network (multi-domain) and difficulty to understand state of the network and how it changes over time. The project will aim to use the current open-source event processing systems to automate detection and location of the network problems using the existing data from the perfSONAR network infrastructure. The project will be done in collaboration with University of Chicago and University of Michigan.

The project will build on the standard WLCG perfSONAR network measurement infrastructure and will aim to gather and analyze complex real-world network topologies and their corresponding network metrics to identify possible signatures of the network problems. It will provide a real-time view on the existing diagnosed issues together with a list of existing downtimes from the network providers to the experiments operations teams.

 

Required skills

TCP/IP networking, Python, Machine learning

Learning experience

The student will acquire practical experience in machine learning, event stream processing as well as software engineering and container-based deployment and operations.

Project duration

12 months

Project area

Data Analytics Monitoring of the distributed infrastructure

Contact for further details

Marian.Babik@cern.ch

References

http://www.nsf.gov/awardsearch/showAward?AWD_ID=1440571

CERN group

IT/CM

Status

Submitted Submitted by mbabik on Tuesday, March 29, 2016 - 16:53.