You are here: Home > Publications > RIPE Labs > Stéphane Bortzmeyer > Processing RIPE Atlas Results with jq

Processing RIPE Atlas Results with jq

Stéphane Bortzmeyer — 21 Aug 2017
A short tutorial on the use of the jq processor and programming language to analyse RIPE Atlas JSON files.

Introduction

The results of RIPE Atlas probe measurements are distributed as JSON files. One of the advantages of JSON is that there are many tools and libraries to process it. You can build an analysing programme from the programming language of your choice, or you can use the tool developed by the RIPE NCC, Sagan.

Building your own analysing programme may be overkill for simple analysis, and, anyway, some people may prefer higher-level tools like jq. jq processes JSON data. It is often used to extract short information from a JSON file on the command line (it is sometimes described as "sed for JSON"). But it includes also a full programming language, for more complex treatments. Let's see how to use it on RIPE Atlas results, distributed as JSON.

Ping measurements

First, let's create a one-off "ping" measurement, #9211624. We can retrieve the result in https://atlas.ripe.net/api/v2/measurements/9211624/results/ These are unformatted JSON files, hard to read by a human. First, jq can pretty-print it (the original JSON is very compact and not suitable for human reading):

% jq . 9211624.json
[
{
"af": 6,
"avg": 159.53339,
"dst_addr": "2001:500:2e::1",
"dst_name": "2001:500:2e::1",
"dup": 0,
"from": "2001:980:93db:1:fad1:11ff:fea9:ecbe",
"fw": 4780,
"group_id": 9211624,
"lts": 65,
"max": 159.53339,
"min": 159.53339,
"msm_id": 9211624,
"msm_name": "Ping",
"prb_id": 10015,
"proto": "ICMP",
"rcvd": 1,
"result": [
{
"rtt": 159.53339
}
],
"sent": 1,
"size": 64,
"src_addr": "2001:980:93db:1:fad1:11ff:fea9:ecbe",
"step": 240,
"timestamp": 1501086978,
"ttl": 53,
"type": "ping"
},
...

A result file is an array of results. Next, jq can extract data from it:

% jq '.[].from' 9211624.json
"2001:980:93db:1:fad1:11ff:fea9:ecbe"
"2001:44b8:31bd:d600:220:4aff:fec7:b002"
"2001:4870:400e:404:a2f3:c1ff:fec4:4bc7"
...

Here, jq prints the memberfrom of all elements of the array (the [] notation means "iterate over the array"). from is the IP address of the RIPE Atlas probe that did this specific test. We surround the jq program with quotes because the brackets are special for the Unix shell.

You don't like the quotes around the IP addresses? That's because jq, by default, produces regular JSON texts, so character strings have to be between quotes. If you don't like that, use the --raw-output option of jq.

Now, let's turn our interest to the RTT in the file. First, which one is the largest?

% jq 'map(.result[0].rtt) | max' 9211624.json 
505.52918

We got half a second for the probe that is the farthest away from the target. How it was obtained from the JSON file requires some explanations. The array of results contain JSON objects (a JSON object is a dictionary, "key: value", see RFC 7159, section 4). Each object has a member result which is an array. In this specific measurement, we asked for only one test per RIPE Atlas probe. So, we can simply take the first member of the array: that's the .result[0] part of the jq program. What we get is an object, and we just extract its member rtt, which is the round-trip time in milliseconds (see the RIPE Atlas documentation).

This operation (taking the first - and only - element of the result array, and getting its rtt member) has to be done for every element of the top-level array (which has one element per RIPE Atlas probe). That's the purpose of the map function, which applies a function to every element of an array. (jq will be familiar to the programmers who use the functional style.)

So, map(.result[0].rtt) will produce a new array, made from only the RTTs of the results. We then send this array to a new jq filter, max. max is a predefined function that yields the maximum of its parameters.

We now know how to get the maximum RTT. So, the minimum is probably simple:

% jq 'map(.result[0].rtt) | min' 9211624.json 
null

Hmmmm, no, there is a problem. Some probes failed to get a result, so the result array contains no RTT:

"result": [
{
"x": "*"
}
],

When jq is asked to retrieve non-existing data (here, the member rtt), it produces null. We must exclude these failed tests from the data:

% jq 'map(.result[0].rtt) | map(select(. != null)) | min' 9211624.json 
1.227755

(We get one milli-second for the closest probe.) We've already seen map. What is select? It keeps only the elements that pass some test. Here, the test is "different of null" (. != null). So, to summarize: the first filter in the jq programme produces an array of RTTs, the second filter removes the null ones, the third one reduces the array to a scalar, the minimum RTT.

Speaking of min, here is an example of its use which is unrelated to RTTs, but which shows what can be done with the command line:

% date --date=@$(jq 'map(.timestamp) | min' 1669362.json)
Tue Jun 3 10:06:32 CEST 2014

(Using the smallest time stamp, we find out when the first tests were done for this measurement. We use GNU date to convert from a number of seconds since an epoch - the timestamp - to a human-readable date.)

OK, back to RTTs. What about the average RTT? Two other predefined jq functions are useful here, add and length. The average being the total divided by the length, we may simply use the division operator:

% jq 'map(.result[0].rtt) | add / length' 9211624.json 
76.49538228

Although it is not immediately visible, there is a bug in this programme: length takes into account the null elements, while add ignores them. We have to exclude the failures:

% jq 'map(.result[0].rtt) | map(select(. != null)) | add / length' 9211624.json 
77.26806290909092

By the way, how many of these failures are in this JSON file?

% jq 'map(.result[0].rtt) | map(select(. == null)) | length' 9211624.json 
1

The average is a very common metric in Internet measurements but, most of the time, it is a bad one. It is brittle, since a few outliers (data well outside the "normal" range) can change it a lot. In most cases, using the median is better. The median being the value such that half of the tests are lower and half are higher, this is simple if you can sort the data: the median will be the middle of the array. Great, jq has a predefined function, sort:

% jq 'map(.result[0].rtt) | sort | .[length/2]' 9211624.json 
43.853845

Again, there is a bug: dividing length by two assumes the length was an even number. We have to be more subtle, and introduce the test (if), plus the modulo operator (%):

% jq 'map(.result[0].rtt) | sort | if length % 2 == 0 then .[length/2] else .[(length-1)/2] end' 9211624.json
43.853845

As you can see, the median is lower than the average. This is because the average was driven up by a few outliers (such as the maximum at 505 ms).

What if we want to display all these computed values? jq can run several filters on the same data, if you separate them by commas. And you can format messages with the concatenation operator, +. Note that jq does not convert automatically numbers to strings, you have to do it yourself with the function tostring. Also, the source code of the jq program begins to be a bit large, it is time to store it in a file (edited with Emacs, of course, with the jq mode). If the file contains:

# Aggregates on RTT in results of RIPE Atlas probes
map(.result[0].rtt) |
"Median: " + (sort | if length % 2 == 0 then .[length/2] else .[(length-1)/2] end | tostring),
"Average: " + (map(select(. != null)) | add/length | tostring),
"Min: " + (map(select(. != null)) | min | tostring),
"Max: " + (max | tostring)

You can invoke it with:

% jq --raw-output --from-file atlas-rtt.jq 9211624.json
Median: 43.853845
Average: 77.26806290909092
Min: 1.227755
Max: 505.52918

The only measurement we've used so far, #9211624, used only one ping test per probe, so the array result has only one item (this is why we use [0], retrieving the one and only test). What if there was several tests per probe? This is the case, for instance, in measurement #9205363. If you examine the JSON file, you'll see that the arrays have more than one element. We modify the programme to:

map(.result) | flatten(1) | map(.rtt) | 
"Median: " + (sort | if length % 2 == 0 then .[length/2] else .[(length-1)/2] end | tostring),
"Average: " + (map(select(. != null)) | add/length | tostring),
"Min: " + (map(select(. != null)) | min | tostring),
"Max: " + (max | tostring)

And it now works for ping measurements using more than one test. flatten, as its name suggests, flattens an array, removing one level of nesting, thus producing a flat sequence of RTTs. (We may prefer subtler treatments such as keeping only the minimum RTT for each probe, then taking median/average/min/max of these minima. This is left as an exercice for the reader.)

So far, we've ignored the null results, the failures. But, of course, you sometimes want to use RIPE Atlas to investigate about failures. See measurement #9236673 for instance. We want to add to our display the percentage of failures. For that, we will need a new jq features, variables.

% jq --raw-output --from-file all.jq 9236673.json
Median: 18.77814
Average: 39.36037152777778
Min: 10.689835
Max: 212.47357
Failures: 30.76923076923077 %

% jq --raw-output --from-file all.jq 9236674.json
Median: 52.07402
Average: 63.66095441237115
Min: 16.30873
Max: 243.605395
Failures: 0 %

(We did two measurements because if is good practice, when you suspect a problem somewhere, to compare with another locations. Here, we test both from the suspected AS and from another one.)

How did we get the percentage of failures? We first stored the total number of tests in a variable, $total. We then just divide the number of failures by this total:

map(.result) | flatten(1) | map(.rtt) | length as $total | 
"Median: " + (sort |
if length % 2 == 0 then .[length/2] else .[(length-1)/2] end | tostring),
"Average: " + (map(select(. != null)) | add/length | tostring),
"Min: " + (map(select(. != null)) | min | tostring),
"Max: " + (max | tostring),
"Failures: " + (map(select(. == null)) | (length*100/$total) | tostring) + " %"

DNS measurements

Of course, you can use jq for other measurement types than ping. DNS measurements, for instance, has interesting data. What if you manage a DNS zone and want to measure the response time for one of your authoritative name servers? (See for instance measurement #9236033.)

% jq --raw-output --from-file dns-auth.jq 9236033.json
Median: 58.654 ms

Where dns-auth.jq contains:

"Median: " +
(map(.result.rt) | sort | if length % 2 == 0 then .[length/2] else .[(length-1)/2] end | tostring) +
" ms"

This measurement directed the probes to use a specific name server (here, d.nic.fr). If, instead of what measurement #9236033 requested, you use the probe's DNS resolver, the member of interest is resultset (see for instance measurement #9205362). The programme dns-resolver.jq has to contain:

"Median: " +
(map(.resultset) | flatten(1) | map(.result.rt) | map(select(. != null)) | sort |
if length % 2 == 0 then .[length/2] else .[(length-1)/2] end | tostring) + " ms"

We have to flatten the results. First, map(.resultset) produces an array of arrays, then flatten makes it a non-nested array, then we extract the response time rt, then we remove the failures, then we sort.

Is it possible to write a jq programme that will equally take results with and without resultset ? Yes but, again, this is left as an exercice for the reader.

Many DNS fields are found in the JSON file. For instance, we test here that NSCOUNT (name server count) was always five, as it should be for this zone:

% jq -r 'map(select(.result.NSCOUNT != null and .result.NSCOUNT != 5))' 9236033.json
[]

If a probe would have found a different value, it would have been printed. We get only an empty array, so everything is fine.

Not all DNS fields are parsed and put into the JSON file. You may have to parse the abuf member that contains the original entire response in DNS format. jq has no built-in way to do that, you would need to extend jq with a custom module (something which is out of scope for this simple introduction).

Measurement #9236032 examines the deployment of QNAME minimisation, a privacy-preserving technique described in RFC 7816. We want to see how many probes use a resolver with QNAME minimisation. The programme uses group by (grouping identical results, to make easy to measure the number of occurrences):

map(.resultset[0].result.answers[0]) | group_by(.RDATA[0]) |
# Create an array of objects {result, number}
[.[] | {"result": .[0].RDATA[0], "number": length}] |
# Now sort
sort_by(.number) | reverse |
# Now, display
.[] | "\"" + .result + "\": " + (.number | tostring)

(We took some shortcuts, for instance assuming the answer was always in the first element of the array. Actually, the DNS doesn't guarantee that but, if you did not request DNSSEC records, it will work, in practice.) The final result is:

% jq --raw-output --from-file qnamemin.jq 9236032.json
"NO - QNAME minimisation is NOT enabled on your resolver :(": 941
"": 46
"HOORAY - QNAME minimisation is enabled on your resolver :)!": 7

Which shows that, even for RIPE Atlas probes, QNAME minimisation is not common. There are more DNS failures (the empty string) than resolvers with QNAME minimisation.

Analysis of the list of probes

Of course, jq can also be used for other files than the measurement results, for instance the list of all RIPE Atlas probes. The file is currently twelve megabytes, which shows that jq can work without any problem on files larger than the typical one-off measurement result file. For instance, how many RIPE Atlas probes are out there?

% jq '.objects | length' list-probes-20170811.json 
22980

Ooops, no wait - some are inactive or dead:

% jq '.objects | map(select(.status_name == "Connected")) | length' list-probes-20170811.json 
9939

jq can produce JSON files from JSON files, so let's make a file of only the connected probes:

% jq '.objects | map(select(.status_name == "Connected"))' list-probes-20170811.json > list-connected-probes-20170811.json

And we'll use it for various statistics. For instance, how many ASes contain a RIPE Atlas probe?

% jq 'map(.asn_v4) | unique | length' list-connected-probes-20170811.json
3519
% jq 'map(.asn_v6) | unique | length' list-connected-probes-20170811.json
1314

And how many countries contain at least one RIPE Atlas probe?

% jq 'map(.country_code) | unique | length' list-connected-probes-20170811.json
179

We can also use group_by again to find the countries with the most probes. The programme is:

group_by(.country_code) |
# Create an array of objects {country, number}
[.[] | {"country": .[0].country_code, "number": length}] |
# Now sort
sort_by(.number) | reverse |
# Now, display
.[] | .country + ": " + (.number | tostring)

And the output is:

% jq --raw-output --from-file probes-country.jq list-connected-probes-20170811.json 
DE: 1332
US: 1032
FR: 827
GB: 609
NL: 540
...

The end

If you want to learn more about jq, you can check the official site and its excellent jq manual. jq also comes with a very comprehensive man page. There is also an official tutorial but it does not go very far.

People who prefer Python may also be interested in a tool similar to jq, but with Python as a programming language, pjy. There are other tools that have resemblances with jq such as jp or jshon.

Happy data processing!

 

2 Comments

Randy Bush says:
21 Aug, 2017 06:15 PM
brilliantly done!
Chris Amin says:
22 Aug, 2017 11:59 AM
Fantastic tutorial Stéphane, thanks!
Add comment

You can add a comment by filling out the form below. Comments are moderated so they won't appear immediately. If you have a RIPE NCC Access account, we would like you to log in.