Advanced Notifications for WAN Incidents
Project name
Advanced Notifications for Network Incidents (ANNI)Project description
One of the main challenges in WLCG WAN networking is the network diagnostics and advanced notifications on the issues seen in the network. LHCOPN/LHCONE as the core global networks in WLCG have more than 5000 active links between 120 sites. Currently, most of the issues are only visible by the applications and need to be debugged after the incident and performance degradation has already occurred. This is primarily due to the underlying complexity of the WLCG network (multi-domain) and difficulty to understand state of the network and how it changes over time. The project will aim to use the current open-source event processing systems to automate detection and location of the network problems using the existing data from the perfSONAR network infrastructure. The project will be done in collaboration with University of Chicago and University of Michigan.
The project will build on the standard WLCG perfSONAR network measurement infrastructure and will aim to gather and analyze complex real-world network topologies and their corresponding network metrics to identify possible signatures of the network problems. It will provide a real-time view on the existing diagnosed issues together with a list of existing downtimes from the network providers to the experiments operations teams.
Required skills
TCP/IP networking, Python, Machine learningLearning experience
The student will acquire practical experience in machine learning, event stream processing as well as software engineering and container-based deployment and operations.Project duration
12 monthsProject area
Data Analytics Monitoring of the distributed infrastructureContact for further details
Marian.Babik@cern.chReferences
http://www.nsf.gov/awardsearch/showAward?AWD_ID=1440571