The BGP communities attribute, which usefully enables network operators to signal specific requests or information to nearby ASNs, also lets them signal RPKI status. But should they? Max Stucchi investigates the propagation of RPKI information in BGP communities.
The Internet uses BGP - the Border Gateway Protocol - to distribute routing information. As we usually say, BGP is like "the Internet's telephone book", where prefixes are registered along with "instructions" in the form of AS Path and other corollary information.
BGP was standardised a long time ago, and has been extended over the years with more capabilities. One of these is the community attribute, a transitive attribute that lets data travel across autonomous systems, allowing it in many cases to be seen around the World.
Communities are a way to signal specific requests or information to nearby ASNs. There are communities whose use is to tell the downstreams of a specific ASN where the routes "tagged" were learned, in which country, and at which Internet Exchange Point. And there are communities used to signal the request to filter the specific announcement, or not to propagate across to other ASNs.
Another "extension" built on top of BGP is RPKI. RPKI enables network operators to verify that an autonomous system is authorised to originate a prefix. This is an external process that uses cryptography to provide information to the BGP Process in the form of ROAs or VRPs. With this data, the BGP process can make more well-informed decisions on accepting BGP updates or not, based on the RPKI Status of a prefix being announced.
The problem statement
In a discussion at the last DKNOG meeting with Job Snijders, we were wondering if there were any operators signalling RPKI Status with communities, and if this was widespread or not.
Right there at the meeting I performed a quick survey using bgpstream, and soon discovered one of the major operators to be propagating RPKI Status - of course only "Not Found" and "Valid" - and it was clearly visible.
More recently, while debugging a network issue and noticing, on the looking glass I was using (https://lg.twelve99.net/) - as shown in the following image - that some prefixes were tagged with information related to RPKI Status, I decided to go back and investigate more.
The problem is that the validation state of a prefix is information that should stay inside the network that has run the validation process. Carrying this information across the Internet may cause additional updates related to state change in RPKI. This means that if we add or remove ROAs, and if the state is propagated in RPKI, whenever some changes happen in RPKI, these are reflected as unnecessary updates in BGP, contributing to creating a lot of noise.
First pass
After the initial investigation performed using bgpstream, I decided to use BGPKit this time, with data taken from Routeviews.
I started by verifying how many prefixes in the RIB available from Routeviews contain the following series of communities from two operators, Arelion and Lumen, picking a few random days in 2023.
I am checking if the entries in the RIB contain any reference to any of these 5 communities:
1299:430 (RPKI state Valid)
1299:431 (RPKI state Unknown)
3356:901 (RPKI Valid)
3356:902 (RPKI Invalid)
3356:903 (RPKI Not Found)
I am in reality hoping not to see 3356:902
anywhere, as that would mean there are invalid RPKI announcements being propagated on the Internet. AS1299
does not seem to have a community for Invalids, or at least that was not visible anywhere.
This is an example taken from routeviews3 for a full RIB dump at 22:00 on 27 October 2023:
Collecting data from routeviews3:
Total entries: v4 26076029 - v6 2263807
Occurrences of 3356:901 v4: 1686136 - v6: 1693
Occurrences of 3356:902 v4: 0 - v6: 0
Occurrences of 3356:903 v4: 2241052 - v6: 1199
Occurrences of 1299:430 v4: 0 - v6: 0
Occurrences of 1299:431 v4: 0 - v6: 0
The good news is that no 3359:902
was seen in the wild. There are a number of routes carrying our "target" communities. None, though, seen coming from 1299
. That changes if we move to routeviews6, which per its name, focuses on IPv6:
Collecting data from routeviews6
Total entries: v4 0 - v6 4655520
Occurrences of 3356:901 v4: 0 - v6: 312089
Occurrences of 3356:902 v4: 0 - v6: 0
Occurrences of 3356:903 v4: 0 - v6: 279334
Occurrences of 1299:430 v4: 0 - v6: 16154
Occurrences of 1299:431 v4: 0 - v6: 16028
In general, all the 3356
-related communities are much more visible than the ones from 1299
. However, we can say there is a good number of combined routes with the communities we are looking for, accounting for about 13.3% of the total entries, if we sum all of them up.
Counting updates
The next step, then, is to verify how much these communities are seen in BGP updates. In order to do this, we can use RIPE RIS and its RIS Live service.
I have set up a process to check for 24 hours all the updates coming from different Route Collectors. I have chosen the following RRCS:
- RRC00: Amsterdam Multihop;
- RRC01: London - LINX and LONAP;
- RRC03: Amsterdam - AMS-IX;
- RRC06: Otemachi, Japan - DIX-IE and JPIX;
- RRC14: Palo Alto, California - PAIX;
- RRC15: São Paulo, Brazil - PTTMetro-SP;
- RRC19: Johannesburg, South Africa - NAP Africa JB;
- RRC20: Zurich, Switzerland - SwissIX;
- RRC24: Montevideo, Uruguay - Multihop for the LACNIC Region;
- RRC25: Dubai, United Arab Emirates - UAE-IX.
This mix provided good coverage for each region. The Multihop collectors carry the highest number of updates and information, while Zurich gave me specific information that I will discuss in more detail later in the article.
In this brief analysis, I will focus on data coming from RRC00
as an example, since I noticed that almost all of the collectors have similar data with the same ratio.
This graph shows the amount of BGP update messages coming through from RRC00
, with hourly updates. In green you can see the updates where the target communities could be seen, while in red the updates where they were not present.
While the updates not including the target communities have fluctuations, we can see that for the majority, the updates including the communities have a sort of baseline number, and represent up to more than 16% of the total updates seen in a given 60-minute period. This amount is due in part to the nature of the two ASNs I have focused on, as they are considered “Tier1”, connecting a considerable part of the Internet.
There is a similar trend for IPv6, as you can see in the following graph:
We see the same baseline given by the updates carrying the target communities, and a similar trend to the one seen in IPv4.
A test prefix
As I operate my own autonomous system and have my own LIR, I am in the privileged position to have address space I can use and announce at will. For this experiment I have decided to use 2a0f:fd00::/29
, which is a large prefix for such an experiment, but it was ready to go with its route6 objects, meaning there was no need to wait for filters to be corrected by anyone, and I knew visibility was going to be good. The prefix is announced by AS58280
.
I have set two separate processes for this:
- A function to update the ROA for
2a0f:fd00::/29
with maxLength /48, adding and removing it at each iteration, every 90 minutes; and - A specific filter to check any incoming update about that prefix, as seen on RIS Live from the same Route Collectors running the previous test.
The goal is to see how changes in ROV States for a prefix affect the number of updates, and to see if there are other networks behaving the same way as AS1299
and AS3356
.
In the 24 hours the test has run, I wasn't able to catch an update from the two aforementioned networks, but I did see some interesting behaviour from other entities.
First of all, I wasn’t expecting to see the following from ColoClue, which has specific communities to signal ROV State and RIR-based filtering state. Here’s the update I saw:
Tue, Jan 30, 2024 6:05 PM - New update on RRC03 for 2a0f:fd00::/29 with communities
[8283, 1], [8283, 101], [8283, 102], [65101, 33152], [65102, 33000], [65103, 756], [65104, 150]]
and checking on the aut-num object for Coloclue, AS8283
, I could find this:
remarks: ----------+-------------+-------------------------------------
remarks: 8283:101 | 8283:5:1 | Accepted from peer because of valid IRR entry
remarks: 8283:102 | 8283:5:2 | Accepted from peer because of valid ROA
remarks: 8283:104 | 8283:5:4 | Accepted while RPKI invalid because it is added to our whitelist
remarks: ----------+-------------+-------------------------------------
This means that whenever the state of validation for my prefix changed, I would witness a new update in BGP for it, as community 8283:102
would either be added or removed. In fact, I could find an update such as:
Tue, Jan 30, 2024 11:01 AM - New update on RRC03 for 2a0f:fd00::/29 with communities
[8283, 1], [8283, 101], [65101, 33152], [65102, 33000], [65103, 756], [65104, 150]]
where 8283:102
was missing, due to the fact that the ROA covering the measurement prefix has been removed.
I could find some other updates that looked similar in nature, but from networks who don't have clear, publicly-available, documentation, so I can't say for sure that they were related. This is part of future work I plan on performing. Other networks, similar to ColoClue, have clear documentation. Anexia, for example, publicise their communities on a dedicated website, including RPKI-related ones: https://isp.anexia-it.net/communities/#rpki-communities.
Best Current Practice
There is no Best Current Practice (BCP) at the moment about propagating RPKI information in BGP communities. With this data in hand, there is now work ongoing by Job Snijders, Tobias Fiebig and me to propose one. You can find the work in progress at https://github.com/job/draft-rpki-communities-harmful.
The goal of the work is to help operators understand that there is no gain in propagating ROV Status information in BGP, and while the intent is good, the only effect is an increase in noise around BGP Updates, putting more work on routers around the World.
Future work
I have the intention to complement this analysis with a more detailed focus on the correlation between changes in RPKI and the subsequent impact on BGP Updates. This requires more work coordinating an RTR process with a RIS Live "collector", and it will take some time.
Another point to consider is that, as it is now, RIS Live does not support large communities. I am planning on implementing support on my collection system to explore if there is more data to be found if we look at large communities as well.
Comments 0