RIPE Atlas collects a lot of measurements. But how much of the Internet are we actually measuring? We had a sense that with a limited amount of extra load on the system, we could dramatically increase the number of router IPs seen on a given day in RIPE Atlas - and that means measuring more of the Internet.
Goals and Motivation
Projects like OpenIPMap or ASNtryst (developed at the recent RIPE Atlas hackathon ) benefit from having as-wide-as-possible coverage of traceroutes in RIPE Atlas. Having a wide coverage of the Internet in terms of infrastructure that is seen in traceroutes allows you to use already collected data to target a specific phenomenon or to compare your measurement data with in order to detect changes. To make an analogy with the highway system, if you only watch traffic conditions on 30% of the roads, you're not going to directly observe traffic jams on the roads that you are not monitoring.
Traditionally, the number of destinations for traceroutes in RIPE Atlas has been limited, because users can only specify a single destination per measurement. But what if we could get rid of this limitation?
The way around this limitation that we explore in this article is using DNS to serve a list of destinations for RIPE Atlas probes to measure towards. Here is the trick: create a RIPE Atlas measurement to a specific hostname, and specify that this hostname must be resolved on the probe (see Figure 1). The specific hostname (in this case, ip-list.emileaben.com) points to a host that runs a special DNS server. Every time this DNS server gets asked for an A (or AAAA) record, it returns a single IPv4 (or IPv6) address that it takes from a list of IP addresses with which I configured this server. With this trick, we can make RIPE Atlas probes act as a team (see footnote 1 ) to measure towards a (potentially long) list of destinations, without having to submit large numbers of measurement specifications to the RIPE Atlas system.
Figure 1: RIPE Atlas measurement specification. Fields that are special for DNS-based target list measurements are marked with red circles.
Because none of the standard DNS server software (that I was familiar with) support this, I wrote my own hack in Python/scapy: scapy-dns-ninja. During the first RIPE Atlas hackathon , this inspired the creation of the dns zeerover, which includes a PowerDNS-based version of this idea.
We've been running the scapy-dns-ninja since November 2014 for large lists of IPv4 and IPv6 addresses, specifically all .1 addresses of routed IPv4 and IPv6 prefixes (see footnote 2 ). These are lists of ~500k and ~25k IP addresses respectively.
These lists were used in a number of measurements we set up. The table below shows the measurement IDs and characteristics of these measurements.
|Measurement ID||IPv4/IPv6||Set of Probes||Measurement Interval (s)|
|2444159||IPv4||Selected probes in RIS peer ASNs||900|
|2444158||IPv6||Selected probes in RIS peer ASNs||900|
Table 1: Selected ninja measurements for the results shown later. All these measurements measure towards a list of .1 addresses for all routed prefixes in IPv4 and IPv6.
I tried to select RIPE Atlas probes based on diversity (all RIR regions are covered, for example) and on minimising risk to the probe host. We think the risk of running these types of measurements from probes where probe hosts may be at risk of legal consequences is fairly low, but we still wanted to avoid any risk while testing this.
Here are some results comparing the "ninja" measurements listed in the table above against all other traceroute measurements in RIPE Atlas for two days in 2015:
Figure 2: Metrics for 1 January 2015
Figure 3: Metrics for 25 December 2015
As you can see from these Venn diagrams , with only a small fraction of the probes (left side of Figures 2 and 3), and a large list of destinations that are not targeted by the rest of RIPE Atlas (middle of Figures 2 and 3), you can substantially increase the coverage in terms of the router IP addresses seen on a given day in RIPE Atlas (right side of Figures 2 and 3).
Some 40%-55% of router IP addresses seen on the days displayed were unique to the ninja measurements (light blue on the right side of Figures 2 and 3). Note that, in terms of counting router IP addresses in a traceroute, I didn't count the destination addresses, although they could have been router IP addresses.
The number of destinations in the IPv6 list is roughly the same as the number of IPs that are actually targeted (20k vs. 19k). Further analysis shows that we cycle through the IPv6 destination list in under a day. For the IPv4 destination list (500k addresses), it takes multiple days to cycle through with the number of probes that are currently performing these measurements. As can be seen in Figure 3, we currently see about 218k addresses a day. This could of course be increased by assigning more probes to these measurements, and/or performing the measurements more frequently.
A short test (see footnote 3 ) with all probes in the five countries with the most deployed probes (US, DE, FR, GB and NL, with a total of a little more than 4,000 probes) with a 60s measurement interval revealed that the scapy implementation and small-scale set-up that we use doesn't scale to high-frequency, large-probe-load type experiments. But since this is DNS-based, we don't expect scaling up to be a big issue.
One of the downsides of this type of measurement collection is that there is no guarantee that you'll measure a target on your destination list. Some probes don't do DNS resolution reliably (some don't even do DNS resolution at all). The DNS server that distributes the destinations can go down or have network connectivity issues. Or the DNS answers can be cached longer then we want, so a probe will measure the same destination repeatedly.
Other large-scale active measurement systems have done or are still doing measurements towards a large number of destinations. Examples are CAIDA Ark and iPlane . With an order of magnitude more sources available for measurements, RIPE Atlas would be an interesting addition to projects involved with Internet topology discovery and monitoring.
With a simple DNS trick, it is possible for RIPE Atlas to measure more of the Internet. The data collected is publicly available through RIPE Atlas APIs using the measurement IDs mentioned in Table 1.
We used this particular methodology (i.e. personal domains, small number of probes, low frequency) because we believe in prototyping, and we wanted to see what we'd get out of measurements towards a large number of destinations. Now that we have a better understanding of the results, we believe we should scale this up, and make it an official feature of RIPE Atlas.
We plan to gradually introduce these measurements towards an IP address in all globally routed prefixes from more and more probes. We'll try to optimise the results collected in terms of resources used, usefulness and variety of results. If you have different ideas, concerns or feedback on this approach, please let us know by leaving a comment below or on the RIPE Atlas mailing list .
The software to measure lists of destinations is available on GitHub, so anyone interested can perform measurements towards other long lists of destinations in a similar fashion. If there is enough interest in these types of measurements, this type of facility (measurements towards long lists of destinations) could be provided by the RIPE Atlas system itself, rather than as a separate piece of software. If you are interested in this, please let us know .
Footnote 1: The CAIDA Archipelago project pioneered the concept of team-probing for Internet topology discovery
Footnote 2: This method of creating a destination-list is taken from the iPlane project. A prefix is considered 'routed' if 10 or more RIS peers see it. The prefix lists have been manually updated a few times during the year.
Footnote 3: Measurement ID: 3312212