In this article we look into the uptake of BGP Large Communities using the RIPE Routing Information Service (RIS).

BGP communities

BGP communities are a useful tool for network operators. They allow networks to communicate extra bits of information about a network path to their neighbours. This extra information (for instance 'I received this route in the Middle East', or 'I would like you to prefer this route') allows operators to better tune their networks for where to send and receive traffic.

Many networks that use BGP communities, use either their own Autonomous System Number (ASN) or a private-range ASN in the first half of each 4-byte community value (for an interesting collection of documented BGP communities see https://onestep.net/communities/ ). And here lies the problem: if you have a 4-byte ASN, which has been in use since 2007, you can't use this convention to tag routes the same way 2-byte ASNs can do with the original BGP community specification. As shown in Figure 1 below, 4-byte ASNs are steadily on the rise, and as of June 2018 make up roughly a third of the total number of ASNs we see in the global BGP routing table.

Figure 1: 2-byte ASNs (purple) and 4-byte ASNs (green) over time

BGP Large Communities

The introduction of BGP Large Communities (BGP LCs) solves this problem. BGP LCs are composed of three 4-byte integers, separated by a colon (such as 8283:0:2). This accommodates advanced routing policies for both 2-byte and 4-byte ASNs. RFC 8092 specifies these three 4-byte parts as called 'Global Adminstrator' (for the remainder of this article we'll use the term Global Administrator ASN or gaASN), Local Data Part 1 and Local Data Part 2.

Figure 2: BGP Large Community Specification (credit: Job Snijders, IPJ20-1)

For a detailed description of the development and usage of BGP Large Communities, see this excellent article by Job Snijders in the Internet Protocol Journal, Volume 20, Issue 1.

The first of the three integers specifies the ASN of the network that defines what the other two 4-byte values mean. In our example case (8283:0:2), we can find the definitions of these values in the RIPE DB record for AS8283:

remarks: INFORMATIONAL COMMUNITIES:

remarks: ==========================

remarks: RFC 1997 | Large | Meaning (Informational)

remarks: ----------|------------|-------------------------------------

remarks: 8283:1 | 8283:0:1 | peering routes

remarks: 8283:2 | 8283:0:2 | downstream routes

remarks: ----------+------------+-------------------------------------

Using RIS to measure deployment of BGP LCs

The RIPE RIS route collector system provides us with a peek into BGP. The RIS system has recently undergone an upgrade. As a consequence we can now better look at technology changes like the adoption of BGP LCs, because the state of all BGP attributes (including those that were recently defined) is stored every 8 hours, in so called RIB dump files.

For estimating BGP LC deployment, there are some difficulties though. Specifically stripping of BGP communities and the fact that we only see a subset of all routes, biased by who RIS peers with. We found that looking at distinct BGP LC values that we see in our route collector system is a good way to track deployment, as explained below.

Methodology

We take data from three route collectors in RIPE RIS that have been running since 2016:

RRC19, Johannesburg, ZA
RRC20, Zurich, CH
RRC21, Paris/Marseilles, FR

Due to the fact that they run the new version of the route collector system (based on exabgp), we can use them to track the deployment of BGP LCs. As mentioned above we have a dump of the full collections of paths they see every 8 hours that include newer BGP attributes, like BGP LCs.

Since early 2018 all route collectors have been upgraded, so if you see 'ALL' in the graphs below, that means that the numbers displayed were collected from all the RIS route collectors (not just 19, 20 and 21).

Results

Distinct BGP LCs over time

First we look at the number of distinct BGP LC values that we see; i.e. the full 3x4-byte values. This number tells us something about the number of distinct things that are being signalled between networks using BGP LC.

Figure 3: Distinct BGP LCs seen in RIS over time

As you can see from Figure 3 above the number of distinct BGP LC values has an upward trend, with the latest data (June 2018) seeing 156 distinct BGP LC values defined and collected for the full set of RIS route collectors. For classic BGP communities we see ~47k distinct values in RIS, but this is arguably an apples-to-oranges comparison.

Next we look at the number of distinct values of the first 4-bytes of the BGP LCs, i.e. the gaASNs. The number of distinct gaASN values tells us something about how many ASNs are actually implementing BGP LCs.

Figure 4: Distinct gaASN values seen in RIS over time

As you can see from Figure 4 the number of gaASNs also shows this upward trend, and for the last data we see 48 distinct values. Classic BGP Communities have 4.7k distinct values in their first 2 bytes for the same date.

Measuring different types of BGP LC usage

There are no hard rules enforcing that the gaASN specified in a BGP LC value is the ASN of the network that defines the meaning of the values in the local parts of the attribute. So we have no guarantees that the gaASN is in fact the network that sets the value or is meant to be receiving a route tagged with that value.

What we can do to get a better sense of actual deployment is to look at the gaASN value and compare it to the AS PATH of the route that we receive in our RIS route collector systems. If the gaASN value matches an ASN in the AS PATH we have a strong signal that the gaASN is a network that is actually using BGP LCs.

Type 1: Originating a route

It's also interesting to distinguish between networks that set BGP LCs when they originate a route, and those that set BGP LCs on routes they receive and/or pass on.

For networks that set BGP LCs for prefixes they originate, and that are intended to be globally visible and reachable on the Internet, the BGP Large Community values should, in theory, be visible from everywhere on the Internet, unless the BGP LCs are stripped.

Figure 5 below shows the number of distinct gaASNs seen as origin in the routes they were seen on. We see that the counts for all three route collectors are indeed the same most of the time. Even when calculating this value for the full set of route collectors (in orange), the value does not increase relative to the number we see for individual route collectors. Additional data analysis for June 2018 data reveals that 22 of the 33 gaASNs are 4-byte ASNs.

Figure 5: Distinct gaASN values seen as origin in the AS PATH over time as seen in RIS

Type 2: Routes received and passed on

Those types of networks that use BGP LCs for routes they receive and pass on, can only be measured if our collector infrastructure sees routes passing through these networks.

Figure 6 below shows the distinct gaASN values that are seen in the AS PATH, but which are not the origin in the route they were seen on. Since these values are meant for networks on the path, we will only see them if the routes traverse these networks on their way from the origin to our route collector system. Because RRC19, RRC20 and RRC21 are in different places in the network topology we expect some variations in the counts. Calculating this for the full set of RRCs (orange line), the total count of gaASNs of this type of networks is a bit higher then the counts for the individual RRCs. This could mean that we (slightly?) under-estimate the number of these type of networks. Diving deeper into the June 2017 data reveals that 7 out of these 13 ASNs are 4-byte ASNs.

Figure 6: Distinct gaASN values seen in AS PATH but not as origin (over time as seen in RIS)

Type 3: gaASNs not seen in the AS PATH

Another interesting class of gaASN values are those that are not seen in the AS PATH at all. One explanation for this is that they are set by, or meant for, entities that are on the path, but invisible in the AS PATH, for instance route servers at an IXP. Another explanation could be (mis)configurations that violate the RFC, for instance a typo in a router configuration. Figure 7 shows the number gaASNs that are not seen in the AS PATH of the route they were seen on. Manually going through them we found all but five of them specify gaASN values that point to IXP route servers. We contacted the operators of the five remaining gaASN values. The responses we received confirmed that these ASN values belonged to networks not seen in the path. This resulted in updated configs where the gaASN was changed.

Figure 7: Distinct gaASN values not seen in the AS PATH (over time as seen in RIS)

Conclusion

We've looked into how to track the deployment of BGP Large Communities, and conclude that looking at the gaASN values is a viable approach for tracking this. We see that the gaASN values often correspond to ASNs in the path. In cases where they don't we can explain them either in terms of 'invisible elements in the path' (like an IXP route server), or misconfigurations. We were able to notify some networks about possible misconfigurations, which were fixed. We will keep track of this trend and keep you informed on interesting changes we see.