In order to expand the reach of F-root, one of the 13 root servers, we at ISC looked at where queries to our F-root servers are coming from and where it would make most sense to place new nodes. As a first step, we looked at the existing nodes to see how they behave and if there is anything we can improve. We used RIPE Atlas to do this.
As a root server operator it is very useful to have all the RIPE Atlas probes automatically send measurements to all the root servers. Every four minutes the probes send DNS queries, they get the site IDs and they measure latency. So, from every RIPE Atlas probe we can see which instance of the F-root has been reached and what the latency is to that site.
We looked at the visualisations the RIPE NCC provides based on the RIPE Atlas infrastructure, but they weren't flexible enough for our purposes, so we decided to create our own based on the RIPE Atlas API and OpenStreetmap. That allowed me to look at more historic data via the REST API and also real-time data via real-time WebSocket streaming of live updates.
The image below shows the latency between RIPE Atlas probes and the 59 F-root instances distributed around the globe. The red dots indicate latency of more than 200 ms.
Figure 1: Latency from RIPE Atlas probes towards F-root (red = 200+ ms)
This showed us that the round trip time (rtt) to F-root from some probes (especially in western Europe) is quite high and we wanted to investigate why that is the case and possibly improve it.
US transit misconfiguration
We drilled down a bit further and noticed the latency between much of Europe and the F-root instance in Atlanta (see Figure 2).
Figure 2: Latency from RIPE Atlas probes towards the F-root instance in Atlanta, US
This showed that for some reason, a huge number of RIPE Atlas probes in Europe are querying the F-root instance in Atlanta instead of one that is closer by. Before investigating this with RIPE Atlas, we had no idea this was the case. It turns out that via one of our providers who we peer with in Europe and who we also get transit from in the US (which is mostly used for management traffic), we had inadvertently announced the F-root Anycast address on the transit links within the US. Since this was deemed a customer transit link, it got higher BGP preference than the peering links in Europe. The easy solution was to withdraw the announcement of the F-root Anycast prefix from the US based sites. This immediately solved the problem as you can see in Figure 3 below.
Figure 3: Latency from RIPE Atlas probes towards the F-root instance in Atlanta, US
Now the RIPE Atlas probes in Europe chose a more local F-root instance and the latency went from around 800 to 10 ms.
Another interesting issue we were able to detect is a route leak. Out of the 59 F-root instances, five deliberately export to the global routing table. The remaining sites are local sites. That means they are all announced as NO_EXPORT so we don’t expect any of these routes to go out beyond our peers and our peers’ customers - certainly not to other networks. Unfortunately due to a misconfiguration the NO_EXPORT community was temporarily ignored by one of our peers leading to a number of routes leaking to sites around the Adriatic. Again, it was just a matter of phoning them up and asking them to fix it.
There are many more examples. And even though in most cases, these issues are fixed fairly easily, they create degraded service levels. So, we’re now looking at these maps every day to see if something unusual shows up.
Other noticeable behaviour
These visualisations don't show us errors or misconfigurations, but also other behaviours on the network. When we looked at the map or Europe for instance, we noticed that there was a hole: there were no routes exchanged from the Czech Republic with our F-root node in Amsterdam (see Figure 4).
Figure 4: No routes from the Czech republic to the F-root node in Amsterdam?
Why is that? Are there no RIPE Atlas probes in the Czech Republic? To the contrary: it turns out that there is such good connectivity inside the Czech Republic, that almost everyone peers with the node inside the country (see Figure 5).
Figure 5: Czech Republic routes stay inside the country
A similar behaviour can be observed in the Ukraine, but likely for other reasons. As you can see in Figure 6 almost everyone is connecting to the F-root node in the Ukraine. No routes are exchanged with Russia.
Figure 6: Networks in the Ukraine don't connect with F-root in Russia
The visualisations we built based on the measurement data provided by RIPE Atlas allow us to monitor connectivity to our F-root nodes and to spot unusual behaviour. The RIPE Atlas probe network is an invaluable tool for these kinds of analyses.