The Internet routing system is vulnerable to attacks and that is a problem. But what is the scale of the problem and how does it evolve over time?
When it comes to MANRS these are not rhetorical questions
Mutually Agreed Norms for Routing Security (MANRS) is a community-driven initiative coordinated by the Internet Society that provides a minimum set of low-cost and low-risk actions that, taken together, can help improve the resiliency and security of routing infrastructure.
One of the key elements of MANRS is the measurable commitments by its members. Network operators demonstrate their commitment to improving routing security by implementing four MANRS actions: Filtering, Anti-spoofing, Coordination and Global Validation (for a description of these actions please see the MANRS website.
While checks are made at the time an operator joins the initiative, continuous measurement of the ‘MANRS readiness’ (MR) and increased transparency are essential for the reputation of the effort.
We need to be able to better understand the state of routing insecurity and how it evolves. This is needed to support the problem statement, demonstrate the effect our efforts have and see where problem areas are. This is the idea of the MANRS observatory - a tool where one can see how routing security, expressed in terms of MANRS readiness, evolves for a specific network, country or region.
So what can we observe, given our tool is passive measurements?
Unfortunately, we cannot measure MANRS readiness with 100% assurance - this would require on-site inspection of router configs - but we can get a good indication by looking at publicly available data relevant to the MANRS four actions:
- Filtering requires a network operator to ensure correctness of their own announcements and announcements from their customers to adjacent networks. For an indication of how well this action is implemented we can look at routes leaked or hijacked by a network, or its customers. There are several tools that provide such information, such as https://bgpstream.com/ (which is based on the RIPE Routing Information Service (RIS) and BGPlay).
- Anti-spoofing requires that a network operator implements a system that blocks packets with spoofed source IP addresses for their customers and own infrastructure. One can look at CAIDA’s Spoofer database and look for tests that indicate spoofable IP blocks of the network and its clients.
- Coordination requires a network operator to maintain globally accessible up-to-date contact information, which we can check by looking at contact registration in RIR, IRR and/or PeeringDB.
Global validation requires a network operator to publicly document routing policy, ASNs and prefixes that are intended to be advertised to external parties. To assess this, we can measure the percentage of announcements with corresponding routes registered in an IRR, or valid ROAs registered in the RPKI repository. We can also look at whether an operator documented their routing policy in an IRR.
For some of these actions, devising and calculating metrics is straightforward. For others, - less so.
For example, with filtering, do we calculate a metric based on the impact of an incident, such as a route hijack, had (how much wreckage such incident had) or on conformity (whether an incident demonstrated that the action was not implemented properly)? In the case of the latter, it doesn’t matter if 1 or 1,000 prefixes were hijacked – in any case it means a lack of filtering. Since the objective of this project is to evaluate the readiness, we take the second approach.
In our model, routing incidents are weighted depending on the distance from the culprit. It means that if a hijack happened several hops away from a network, it is considered a less severe mistake than if that is your direct neighbour causing the incident.
Non-action is penalized. The longer the incident takes place, the heavier it is rated. For example:
< 30min -> 0.5 * weight
< 24hour -> 1.0 * weight
< 48hour -> 2.0 * weight
Also, multiple routing changes may be part of the same configuration mistake. For this reason, events with the same weight that share the same time span are merged into an incident. This is shown in Figure 1.
Figure 1: Routing changes, or events (in pink) may be part of the same incident (violet). In this case an operator experienced 3 incidents with the duration of 29 min, 13 hour and 25 hour respectively. The resulting metric will be M=0.5 + 1 + 2 = 3.5
Based on this approach, for each of the MANRS actions, we can devise a composite MR-index and define thresholds for acceptable, tolerable and unacceptable – informing the members of their security posture related to MANRS.
And because we are using passive measurements we can calculate metrics for all 60K+ AS’es in the Internet and track it over time.
This approach was recently presented at RIPE 76. This project is now in a prototype phase and we hope to present some results online later this year.
Let us know your thoughts on above method and other ways you are measuring routing insecurity in the comments section below.
Comments are disabled on articles published more than a year ago. If you'd like to inform us of any issues, please reach out to us via the contact form here.
Sylvain BAYA •
Hi Andrei, It's always a good idea to measure actions, in both sides, to see whether there is progress in real time throught numbers/DataViz:graphs/stats... For my understanding, it's, *usually*, a lake of competencies when you see bad practices in network management/operations. If all of us where doing right things as, related, RFCs indicates, there will not be a need to launch a MANRS program. Now that it's knowledgable that the routing in|Security is regularly more seen as a matter of skills and awareness in Best Practices, it's good to address the problem at roots... I see that you have launched a pilot online training program for trainers on MANRS aspects. I strongly encourage such an initiative then I want to suggest something about collaboration with NOGs : please make sure to contact NOG's organizers, then work closer with them in this issue, prior to engage them for their local community. This imply to ensure to build a local MANRS team well trained in all NOGs. What you can also do while empowering contry|regional|NOGs is to sustain the RING (~IR~NLNOG project) expansion. I think that a RING approach could be a better solution than the Atlas until we will also see probes in containers (VM) version like there is now the case for anchors in test... So backing to the article, thanks for sharing. Hope we will see more interesting results measured in both sides. Regards, --sb.
Jan Zorz •
The RIPE BCOP Task Force just published in cooperation with RIPE Routing WG a nice Best Current Operational Practice document about MANRS that I think you should read if you are interested in this topic. https://www.ripe.net/publications/docs/ripe-706 Cheers, Jan Žorž
Hide one reply
Sylvain BAYA •
Hi Jan, Thanks for your comment and advice :-) By the way, I'm following the MANRS program since it inception. You can see it by tracking the twits and retwits... I'm *roughly* sure that I have, already, read that piece at time of ~writ~publising. Perhaps not entirely because of the repetitive aspect due to my continuing follow-up; but I'm going to read it again as it sound like I have missed something :'-( Hope all goes well for you :-D Blessings, --sb.