One of the main challenges in LHCOPN/LHCONE networking is the network diagnostics and advanced notifications on the issues seen in the network. Currently, most of the issues are only visible by the applications and need to be debugged after the incident and performance degradation has already occurred. This is primarily due to the underlying complexity of the WLCG network (multi-domain) and difficulty to understand state of the network and how it changes over time. This project will aim to use the current open-source event processing systems (such as Spark/Hadoop) to automate detection and location of the network problems using the existing streams. The project will be done in collaboration with the NSF-funded PUNDIT.
The project will build on the standard WLCG perfSONAR network measurement infrastructure and will aim to gather and analyze complex real-world network topologies and their corresponding network metrics to identify possible signatures of the network problems. It will provide a real-time view on the existing diagnosed issues together with a list of existing downtimes from the network providers to the experiments operations teams.