You are here: Home > Publications > RIPE Labs > Petros Gigis > Announcing Daily RIPE Atlas Data Archives

Announcing Daily RIPE Atlas Data Archives

Petros Gigis — 20 Jun 2017
In this article we present a new way to access the dataset on the RIPE Atlas platform. Instead of fetching individual results from measurements using the available APIs, now you can download files containing public measurements for a given day. You can also specify the type of measurement and the IP version.

The RIPE Atlas platform is one of the largest Internet measurement platforms across the globe. At the time of writing this article there are more than 9,700 probes and 260+ anchors connected to the network. 20,000+ continuous measurements are running, which is more than 4,700 measurement results delivered per second. 

Until now, the only way to perform an analysis on all measurement results of a specific type and protocol (IPv6/IPv4) was by using the available REST API or the streaming service. When using the REST API to perform this kind of analysis, the only way to proceed was to start iterating through the REST API, fetching results, which was expensive both in terms of time and network.

The option to offer downloadable files with all measurements [1] for a given day was discussed in the MAT WG session at RIPE 72, and we are now in a position to make an initial version of this available.

 

 

The daily RIPE Atlas data results

On a daily basis, we will release files containing RIPE Atlas data for each of the measurement types, for each IP version. The files include only the public measurement results that were collected by the RIPE Atlas infrastructure during that day. At any given time, you will find data for the last 30 days (a sliding window).

The datasets are available on the public ftp server at: https://ftp.ripe.net/ripe/atlas/data

The naming format of the files is in accordance with the following naming convention:

$TYPE-$IPV-$SUBTYPE-$DATE.bz2, where:

  • $TYPE can be {traceroute, ping, dns, ntp, http, sslcert}
  • $IPV is either v4 or v6, according to IP protocol specified in the measurement
  • $DATE is in the format of YEAR-MONTH-DAY. (etc. 2017-06-13)
  • $SUBTYPE is either builtin or udm
  • The DNS results have one more $IPV of results called proberesolved.

In total, 26 files are being generated, the size of which per day is about 25 GB in total.

Using the following script you can fetch all files for a specific date. 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
DATE="YEAR-MONTH-DAY" # e.g., 2017-06-13

TYPES_ARRAY=(
    'dns-v6-builtin'            'dns-v4-builtin'
    'dns-v6-udm'                'dns-v4-udm'
    'dns-proberesolved-builtin' 'dns-proberesolved-udm'
    'http-v6-builtin'           'http-v4-builtin'
    'http-v6-udm'               'http-v4-udm'
    'ntp-v6-builtin'            'ntp-v4-builtin'
    'ntp-v6-udm'                'ntp-v4-udm'
    'ping-v6-builtin'           'ping-v4-builtin'
    'ping-v6-udm'               'ping-v4-udm'
    'sslcert-v6-builtin'        'sslcert-v4-builtin'
    'sslcert-v6-udm'            'sslcert-v4-udm'
    'traceroute-v6-builtin'     'traceroute-v4-builtin'
    'traceroute-v6-udm'         'traceroute-v4-udm'
    )

for
type in ${TYPES_ARRAY[@]} do wget https://ftp.ripe.net/ripe/atlas/data/$type-$DATE.txt.bz2 done

 

File format

Inside every .bz2 file there is a single .txt file that includes all the results. Each line of the .txt is a measurement result .json object.

The following python code is an example of how you can parse the data.

1
2
3
4
5
import json
with open("filename.txt", "r") as atlas_data:
    for line in atlas_data:
        decoded = json.loads(line)

 

Use cases

During previous years, many researchers and operators used RIPE Atlas results to perform various analyses. One such example is the Remote Peering Jedi tool that was developed during the RIPE NCC IXP Tools Hackathon in Madrid. The Remote Peering Jedi tool carries out continuous parsing of traceroute paths generated by public RIPE Atlas measurements and identifies remote peers on the IXPs.

Another example of a researcher going through huge amounts of RIPE Atlas data is the work done by Romain Fontugne on Pinpointing Delay and Forwarding Anomalies in RIPE Atlas Built-in Measurements. For this work, we initially had to ship a disk of traceroutes. This would not be necessary anymore with this new RIPE Atlas data archive.

Conclusion

We believe that providing an easy and fast way to access the RIPE Atlas results per day will significantly help researchers, operators and others interested in bulk data analysis develop new methods and tools.

This is a prototype, so we can still change the formats and the way we split or merge the various files we create, if we hear strong preferences on this. If you have feedback on this prototype service, now is the time to speak up! Leave your comments at the end of this article. Our plan is to make this into a production service once we've taken all your comments into account.

 

[1]: Some results from some RIPE Atlas probes are delivered with a delay. Therefore they might not be included in the daily archives.

2 Comments

Wouter de Vries says:
20 Jun, 2017 11:15 AM
Thanks for making this available! This will surely make some types of research much faster :-)
Milad Afshari says:
23 Jun, 2017 05:41 PM
Wow, very useful.Thanks
Add comment

You can add a comment by filling out the form below. Comments are moderated so they won't appear immediately. If you have a RIPE NCC Access account, we would like you to log in.