Giovane Moura

RootViz: New Root DNS Reachability Dashboard

Author image
Giovane Moura

6 min read

0
Article lead image

In this guest post, the team at SIDN Labs introduce a new dashboard built on RIPE Atlas data that lets the community visualise real-time measurement results and explore how probes reach the Internet’s root DNS servers.


Over at SIDN Labs, we have recently built an open dashboard called RootViz, which visualises in real time measurement data produced by all RIPE Atlas probes. It allows users to visualise real-time measured data on reachability and latency between all RIPE Atlas probes and each root server, for both IPv4 and IPv6.

RootViz complements DNSMON in two ways:

  • It uses the industry’s default time series visualisation (Grafana) and by leveraging a different dataset from DNSMON.
  • It uses data from all RIPE Atlas probes, not only the robust anchors. It uses the same dataset that we previously used in a study on anycast vs DDoS on the Root DNS system.

Datasets

RIPE Atlas measures every root server every four minutes, asking all of their 14,000+ probes to send DNS TXT CHAOS queries to each root sever letter. Below are links to each RIPE Atlas measurement:

Route server: A B C D E F G H I J K L M

IPv4

10309

10310

10311

10312

10313

10304

10314

10315

10305

10316

10301

10308

10306

IPv6

11309

11310

11311

11312

11313

11304

11314

11315

11305

11316

11301

11308

11306

How RootViz works

Every 30 minutes, the tool downloads measurement datasets for each root server letter in the table above, covering [t-60,t-30] minutes. That involves a very large volume of data. To keep things scalable, RootViz only computes and stores the aggregated metrics for the interval, such as number of timed out probes, median RTT, etc.

We also use it to monitor one of our .nl authoritative servers, and we are looking into making the code open source. However, all the heavy lifting is done by RIPE Atlas probes, which make the measurements: RootViz only aggregates and visualises them.

Dashboards

On the landing page of the dashboards, we show the percentage of RIPE Atlas probes that time out while trying to reach each root server identifier (RSI, or root server letters).

For instance, below we show one week of timeouts for each RSI, for IPv4. We see oscillations in the period that we will be exploring further later. For now, we are just visualising the results.

In this process, we disregard probes that do not work by default, for IPv4 and IPv6. RIPE Atlas tags them accordingly.

Why timeouts? For a root server operator, the most important metric is reachability: being able to serve clients. Timeouts may suggest reachability issues between client and server, on the network, or on the server itself. (RTT is secondary, given root DNS responses are expected to be cached for 2 days.)

The basic idea is crowdsourcing: having a few probes (or a few hundred) timing out constantly indicates a persistent error, while having spikes suggests something else. We will be exploring the implications of the spikes later.

In addition to the combined dashboards described above, we also include the following metrics:

Dashboard per RSI

We have also generated one dashboard per root server identifier, on which we show nine graphs for each time server. This can be used by their operators or other interested folk.

A ROOT | B ROOT | C ROOT | D ROOT | E ROOT | F ROOT | G ROOT

H ROOT | I ROOT | J ROOT | K ROOT | L ROOT | M ROOT

For instance, for L-ROOT, we show below the latency for IPv6, including the median, 75th percentile (p75) and 99th percentile (p99):

What’s next

RootViz is currently mainly useful for visually inspecting events - a way to spot things in real time that might otherwise go unnoticed.

Our goal is to add automatic anomaly detection and event analysis. We also plan to make the datasets and metrics publicly available to the community, as they can also be used by other DNS operators to monitor their own services.

We will offer realisation of the extensions as a project to students on the TU Delft Computer Science Bachelor programme, where teams build software as part of their coursework. Previously, for example, students developed NTPinfo, an NTP measurement tool that we now provide as a public service (see this announcement).

If you have ideas, feedback or suggestions, we would love to hear from you.

0

You may also like

View more

About the author

Author image
Giovane Moura Based in Arnhem, The Netherlands

Giovane is a Data Scientist with SIDN Labs (.nl registry) and a Assistant Professor at TU Delft, in the Netherlands. He works on security and Internet measurements research projects. You can reach him at http://giovane-moura.nl/

Comments 0