In this post (originally published on the APNIC blog), Cengiz Alaettinoglou gives a brief overview and comparison of the IRR and SIDR security models and shares his thoughts about the chances for these models to succeed.
Securing Internet routing was a highly debated topic during the APRICOT 2015 meeting in February. This is a subject dear to my heart as well. I was one of the co-inventors of the Internet Engineering Task Force’s (IETF) Routing Policy System (RPS) Working Group’s security model; more widely known as the Internet Routing Registry (IRR) security model. I thought that sharing some of our experiences with deploying the IRR model can help others as well, including the IETF’s newer Secure Inter-Domain Routing (SIDR) Working Group’s model.
This is a complicated subject. Throwing more technology at it is not going to be sufficient, as successfully securing the Internet’s routing requires addressing social and economic challenges as well. We found out that the latter two were harder to address than the technology. Unfortunately, most of the advances in this area have been in the technology front; hence the SIDR model’s adoption has been low. How do we get service providers and enterprises to use one of these solutions? How do we pass the point where they stop seeing this as a cost burden?
I will give a brief overview and comparison of the IRR and SIDR security models and then will dive into our experiences with these models.
At the core of the IRR model is the object-oriented Routing Policy Specification Language (RPSL). “Route” objects specify that the route object’s prefix can and will be announced in the Border Gateway Protocol (BGP) by the route object’s Autonomous System (AS). Route objects need to be signed by both the prefix holder and the AS number holder (see http://www.rfc-editor.org/info/rfc2725 ) according to the addressing and numbering registries (such as Regional Internt Registries like APNIC and the RIPE NCC).
RPSL aut-num objects contain the import and export policies of their AS; that is, they contain what routes are imported from and exported to each of their neighboring ASes. Aut-num objects only need to be signed by the AS number holder. The language contains other objects in order to simplify specification of the routing policies, but they are not important in this discussion.
These objects are registered at one of the registries participating in IRR. The verification of the objects, such as authorisation and authentication checking, is done at the registration time. IRR registries are distributed (see http://www.rfc-editor.org/info/rfc2769 ) and as the data is replicated across registries, subsequent registries can re-verify the objects. A service provider can also run a routing registry where it stores its local objects as well as replicates remote objects (after re-verifying their authenticity). From these locally stored objects, a service provider using a tool (for example: see http://irrtoolset.isc.org ) generates router configurations, mainly route filters that implement its policies. This is similar to, and indeed is, Software Defined Networking, except instead of a controller we have a compiler where we generate router configurations from high-level network-wide policy specifications. The ability to automate lengthy and error-prone BGP router configurations was one of the primary goals of the effort.
With route and aut-num objects registered, verified, and routers configured with the correct filters, it is possible to verify what route announcements in BGP are valid and what announcements are invalid and should not be accepted. Invalid announcements could include both accidental mistakes and malicious attacks such as prefix hijacking (for example: Pakistan hijacks (YouTube); and man-in-the-middle attacks (for example: bitcoin heist ).
The SIDR model has two components. The first component, route origin authorisation (ROA), is similar to RPSL’s route objects. It states that a prefix owner authorizes an AS to announce its prefix in the BGP. ROA objects only need to be signed by the prefix owners. RPSL route objects on the other hand had to be signed by both the prefix holder and the AS owner. This makes ROA objects easier to manage. However, if one was to create a filter for a typical import policy “from neighbour AS x import all prefixes registered with origin AS x,” the SIDR model does not ensure that AS x is in charge of what these prefixes are. Indeed, AS x has no obligation to announce any of the prefixes with valid ROA objects with AS x. This is not as serious a problem for the SIDR model as the IRR model because of its second component.
The second component of the SIDR model is protocol extensions to BGP called BGPSec. BGPSec amends the BGP AS_Path attribute with the BGPSec_Path attribute using a series of signatures from all the ASes in the AS path. The nth signature in this attribute affirms that the nth AS in the AS_Path has passed the route it received from the previous AS in the AS_Path to the next AS in the AS_Path. In addition, the first signature affirms that the first AS originates the prefix.
In the SIDR model, verifying authentication and authorisation mainly happens in the BGP speaking routers. Verification of ROA objects can be made offline in a server and a list of valid prefix/AS origin pairs can be downloaded to the routers as a filter, similar to the IRR model. However, validating the signatures in the BGPSec_Path attribute has to happen in the routers as the routes are being announced. This causes a computational burden on the routers, and it is questionable that today’s routers have the sufficient computational capacity.
This on-the-fly authentication and authorisation checking before accepting a route eliminates the IRR model’s need for publishing policies and is very attractive. Note that the policy filters are still in place at AS boundaries as they implement business relationships; however they don’t need to be publicly published anymore. Publishing policies publicly was one of the criticisms of the IRR model. However, because publishing policies is replaced by protocol machinery, BGPSec may not protect from all attacks if the machinery was not sophisticated enough to foresee a new type of attack. In the IRR model, this is not an issue as long as an AS filters all of its peers, not just its customers, including large peers with very large route announcements.
The IRR security model has been available since the mid 90s. However, we are nowhere near securing the Internet’s routing system, because its adoption had been low. To ensure the SIDR model’s successful adoption, we need to understand where the IRR model failed and make sure the SIDR model addresses these shortcomings.
Tragedy of the Commons
First, let’s look at the economics of deploying one of these solutions. Early deployments incur most of the cost, and benefits are not achieved until a critical mass is reached. For example, YouTube may register its RPLS route objects or SIDR ROA objects, but necessary filters may not be in place at the upstream provider of the Pakistani service providers, and hence YouTube routes may still be hijacked.
Unfortunately, both the IRR and the SIDR models suffer from this phenomenon. Geoff Huston, in his APRICOT 2015 presentation , identified this as the “tragedy of the commons.” That is, if every service provider optimises the outcome for itself, we cannot reach the globally optimum outcome. Typical solutions to tragedy of the commons involve “regulation.” Since regulation is often undesired, what can we do to reach critical mass without it?
In IRR model deployments, we wanted an evolutionary path, not a revolutionary path to reach critical mass. On this front, we wanted to take advantage of RPSL’s heritage. RPSL is based on earlier work known as RIPE-81 . RIPE-81 had route objects; however, it lacked a security model and expressive policy representation. RIPE, as one of the early IRRs, had many of these objects already registered. In the United States, the Routing Arbiter team (a collaboration between University of Michigan’s Merit Networks and University of Southern California’s Information Sciences Institute), which I was a part of, converted similar policy objects found in the NSFNet backbone network’s policy database into route objects and stored them in a new IRR called RADB.
Meanwhile, the Internet was going through a big commercialisation transformation; from NSFNet being a single Internet backbone network we switched this to multiple commercial backbones and regional networks. As a result, many network operators changed their upstream service providers. Since these new service providers did not use or require registration of these objects, the objects in the IRR became out-of-date very quickly.
Ultimately, the data became stale because it was not used operationally. And conversely, it was kept up-to-date where it was used operationally. I was disappointed to learn that the SIDR model is already suffering from this stale data phenomenon. In his APRICOT 2015 talk , Fakrul Alam reported that more than 50% of new ROA objects registered in the APNIC database are already invalid. Some regions are doing better than others, but invalid data is present in all regions . We have to find a way to reverse this trend.
RPKI Dashboard provided by SURFnet
I think the only way out of this is the operational use of the data. If we turn on BGPSec today, we would be breaking the reachability of the invalid prefixes. I am not advocating breaking anybody’s reachability, especially not of the early adopters. This can be avoided with sufficient monitoring and warning of these invalid announcements before turning on such a switch.
If we don’t turn the switch on, the amount of stale data will increase. If we are ever going to turn it on, it is best to do it while the stale data is small.
Weak Security Model
RIPE-81 used two weak authentication methods: mail-from and unix-crypt. One could register objects by sending an email to a well-known registry mailbox. Mail-from, which is now deprecated, simply checked the sender’s email address against an allowed list of email addresses, and unix-crypt required sending the user’s password in the clear in the body of the email. We have solidified the security model with a public and private key pair method as well. However, we have not deprecated the old methods; we simply discouraged it and provided a transition path to the more secure method. After all, if an operator did not care to protect himself, it is his prerogative. Mail-from and unix-crypt were still useful against accidental misconfigurations. There was a social aspect of this choice that we did not anticipate; it gave the IRR model a bad security reputation and was used as an excuse against updating the stale data.
The SIDR effort definitely sides on the security side of this balance. However, as a result, it needs a database that starts from scratch. Fakrul also reported that ROA adoption had been less than 1% in most regions. LACNIC is an exception to this with an adoption rate of almost 25% with less than 4%of invalid data.
The IRR model uses Pretty Good Privacy, which is based on cryptographic signatures. These are based on a web of trust among service providers (where the public key of a service provider is signed by other service providers). The SIDR model uses X.509 based certificates , which are hierarchically assigned. In the SIDR model, it is possible to shut down a misbehaving service provider by revoking its certificates. However, some service providers worry that this feature might be abused. Use of certificates makes registries a relying party, which is an uncomfortable change to some registries.
Out-of-band verification and need for publishing policies
The IRR model uses out-of-band verification. That is, it relies on the IRR containing route and aut-num objects with accurate policies. This data is then analysed and compiled into router configurations. All of this happens before any BGP message is received. When BGP messages are received, appropriate filters are in place to accept only valid announcements. That is, the announcements that would cause prefix hijacking or man-in-the-middle attacks can be filtered out. The system however requires registering accurate policies such as who the peers of each AS are and what routes are being exported and imported from them. Some service providers have privacy concerns for revealing this information. In reality, most of this information is already in the BGP routing tables.
The SIDR model on the other hand uses hybrid out-of-band and in-band verification. For ROA objects, it can use either in-band or out-of-band validation. For verifying what BGP AS paths are valid, it uses in-band validation using BGPSec. This replaces the need for registering policies with new validation machinery that is now part of exchanging BGP routes. This is a great benefit. However, it has a serious drawback. This in-band machinery needs to be updated each time a new kind of an attack is discovered. For example, when man-in-the-middle attacks surfaced, it was realised that BGPSec did not protect against them while the IRR model did. BGPSec is now being further extended to protect against some classes of man-in-the-middle attacks. We are looking at a standardisation-implementation-deployment cycle of roughly two or more years. We will pay this penalty each time we face an attack we have not dealt with before.
When we worked on the IRR model, most invalid announcements, despite the great harm they caused, were accidental mistakes. Security incidents are becoming more frequent and definitely much more malicious. However, it is easy to look at these incidents as “somebody else’s problem.” It is easy to do that until you are the one being attacked, and it is too late at that moment to secure it. Hence, we must act and secure the Internet’s routing now.
I am disappointed to say I don’t have a recipe for success. I can only provide our insight and several trade-offs. As I said in the beginning of this article, we are not simply dealing with a technical challenge but with economic and social challenges as well. I hope this post helps address some of the social challenges by raising awareness of the importance of securing the Internet’s routing.
Andrei Robachevsky at ISOC and others are taking on the social challenge big time. He is bringing operators together and asking them to sign a routing manifesto known as The Mutually Agreed Norms for Routing Security (MANRS) . Andrei, in his APRICOT 2015 talk , states that by participating in MANRS, a service provider commits to best practices such as preventing propagation of incorrect BGP routing information, preventing traffic with spoofed source IP addresses, and agreeing to coordination and collaboration among participants by keeping their contact information and policy objects accurate in registries. He has already signed up many service providers around the world to participate and is looking for more service providers. Both he and I hope that the more service providers sign up, the more adoption will accelerate.
Regional and local registries have been big advocates of deploying security models as well. They provide tutorials on the subject during network operators meetings such as APRICOT, APNIC, RIPE and NANOG. These materials are also available online.
On the economic front, if we are dealing with the tragedy of the commons phenomena, do we need regulation? Or can we reach critical mass with social advocation and arm-twisting? The Internet does not have a central governing body that regulates (though some international organizations are seeking this authority). Regulation often slows down innovation; and because of this I’d rather avoid it. However, either we reach critical deployment of a security model, or we reach a critical number of malicious attacks. If we reach the latter first, I suspect regulation might be on our horizon!
Personally, I would like to see the SIDR model succeed. The IRR model is 20-years-old now. It is older than the World Wide Web. It has not been adopted well and is full of stale data. However, the SIDR model’s success relies on the feasibility of running BGPSec in routers. Some worry about the cryptographic computational needs of running BGPSec and still consider the IRR model as the viable alternative. I would like to see if the issues around BGPSec can be fixed before we do that. If we cannot fix them, we need to see if we can perhaps build a hybrid model, or, if we need to enhance the IRR model, bring it to the 21 st century.
Cengiz Alaettinoglou is currently the CTO at Packet Design and is working on SDN analytics and developing a prototype of a Network Access Broker. He is a widely published author and popular lecturer and was co-chair of the Internet Engineering Task Force (IETF) Routing Policy System Working Group.