Is the current set of the RIPE Routing Information Service (RIS) peers providing a good level of diverse data? Let's look at some measurement results.
The RIPE NCC collects a lot of data about the Internet and makes it freely available to researchers, network operators and other interested parties. We operate two unique data collection platforms: RIPE Atlas and the RIPE Routing Information Service (RIS).
In this article, I'm focusing on RIPE RIS, which collects data on the Internet control plane, sometimes also referred to as the GPS navigation system of the Internet.
RIS collects traffic from hundreds of Internet routers that speak the BGP routing protocol. Since we cannot collect BGP data from all routers on the Internet, RIS provides a sample. But is it a representative and/or diverse sample?
Analogous to our diversity efforts in the RIPE community and attempts to measure diversity of its participants (see Shane Kerr's ripemtggender), it is interesting to look at how diverse the data from our RIS peers is.
First, let's take a look at sampling. RIS consists of two types of collection points (also called RIS route collectors or RRCs):
- The first type are route collectors that are placed at an Internet Exchange Point (IXP). These RRCs collect data from routers (i.e. RIS peers) that are located at that IXP. Because these RRCs are usually  physically close to the RIS peers, they provide a stable and reliable data set, but at the same time what they collect is limited to the views of the Internet that are available from peers at this IXP. Most of the RIS RRC are of this type. A few of them collect data from peers at two IXPs simultaneously. For instance, RRC01 collects at LINX and LONAP, and RRC03 collects at AMS-IX and NL-IX.
- The second type of route collectors are not located at IXPs. These RRCs are not bound to collect data of the local network they are connected to (like the type described above). They can collect routing data from peers all over the Internet, sometimes from far-away places. This allows them to connect to a more diverse set of peers. But the data collection might be less stable, reliable and timely. These RRCs are called 'multi-hop' collectors. There are only two multi-hop RRCs: RRC00 at the RIPE NCC and RRC24 at LACNIC.
You can find a map and a list of all RIS route collectors on the RIS web page.
RIS peerings are established by volunteers (similar to RIPE Atlas probes who are also hosted by volunteers). Any operator can offer to peer with a RIS route collector. This is sometimes referred to as convenience sampling. Convenience sampling is cost effective but includes an unknown bias. Intuitively RIS (and RIPE Atlas) are biased towards a small subset of the wider Internet community - e.g. network operators with a good degree of technical knowledge and who care about projects that are "good for the Internet".
But how do you measure diversity? In the case of RIS, one way is to look at how different the topology is between two RIS peers. If two peers see the same set of ASes and the same links between them in the AS path, they must be similar. We tried to capture this similarity of RIS peers in a single score between 0 and 100%. For more details see footnote .
Figure 1 (you can click on the image to enlarge it) shows a matrix of how similar each pair of RIS peers that provide us with full IPv4 routing tables is. Peers are grouped by RRC, so each square around the diagonal shows the intra-RRC diversity. Except for RRC00 and RRC24, all RRCs in this figure capture information via one or two IXP peering LANs, so one can expect to see higher concentrations of similar peers (darker blues) in these intra-RRC squares. Lighter lines indicate peers that are different from the rest.
The darkest blue cells indicate peers with a similarity of close to 100%. This means from where they are located in the Internet, they see a very similar Internet topology. For example, if you look at the RIS peers connected to RRC12 (lcocated at DE-CIX, Frankfurt), you see a dark pattern across the board. This means that adding more RIS peers there might not result in a more diverse data set. If you look at RRC06 (located at DIXIE, Tokyo), the opposite happens. The few peers we have in this location are dissimilar from each other and the rest of RIS peers.
On the other hand, the yellow cells show RIS peers with a high dissimilarity. We would expect that of RR00 and RRC24, because they are multi-hop route collectors. And indeed, RRC00 has a pattern of yellow stripes that shows that it has peers that are relatively different from anything else. RRC15 (located at the PTTMetro-SP, Sao Paolo) shows another interesting pattern, where intra-RRC similarity is high, while those peers are relatively dissimilar to peers outside of the RRC.
Figure 1: Matrix of all RIS route collectors and their RIS peers for IPv4
Figure 2 shows the same matrix for IPv6. We observe more diversity in our sample for IPv6 than for IPv4. Note that the scale starts at around 45% compared to 55% in the IPv4 matrix. However, we can also see a larger group of RIS peers that is very similar. We have not analysed this further, but this could be due to the "Hurricane Electric effect": large parts of the IPv6 Internet interconnect via HE, or at least the parts we see in RIS do.
Figure 2: Matrix of all RIS route collectors and their RIS peers for IPv6
Why diversity matters?
If we can use the methodology above to define diversity, maybe we can use the results to make better informed decisions about the selection of the type and location of new RRCs and RIS peers. Can we use it to find new RIS peers that would make RIS more diverse? Would it allow us to see a wider set of phenomena in Internet routing and therefore increase the value of the RIS data?
When we cannot use the full set of RIS data (only a sample), having more diversity in the RIS data set could be beneficial. For instance, if you want timely data in RIS Live and you are bandwidth-limited, the diversity scores could allow you to sample a smaller set of RIS peers that have the most diversity and leave out data that is likely to be redundant, because the peers you left out are very similar to the peers you selected.
Chris Morrow recently made an interesting comment on the NANOG mailing list (Thu, 15 Aug 2019 11:38:17 -0400) about this topic:
[...] RIPE probes, how do we (the internet) get more deployed (or better interconnection to the current sets)? and maybe even more importantly... what's the right spread/location/interconnectivity map for these probes?
In this context RIPE probes refer to RIS peers.
Could we use a diversity analysis like the one above to assess how much value (in terms of diversity) an additional RIS peer adds to the total pool and therefore to the value of RIS? Could we increase the chances of identifying hijacks that we would not detect with a less diverse RIS infrastructure?
This related effort is also interesting in this context: How complete is our view of the Internet? We know our views of Internet routing are incomplete, but we can try to make things better? I hope being able to contribute to this effort by measuring diversity in the systems that we build to capture the state of Internet routing.
1. Yes, we know about remote peering!
2. I implemented this similarity as the Jaccard similarity coefficient between the sets of directed AS-links seen in the AS paths of two full feed vantage points. At the upper bound, if these two vantage points see exactly the same routes, they would see the same AS-links and the metric would show 100% similarity. At the lower bound two vantage points that see a lot of diversity in routing would only see the same AS-links near the end of the path. Credits to Franziska Lichtblau for suggesting to use AS-links in this metric.