Pings provide a simple measure of how long it takes to get there and back across the Internet. What's not so simple is working out how long the trip there took versus the trip back. In this guest post, Ben Cox introduces a method for splitting the ping.
Ping is one of the fundamental pillars of networking. It’s simple, universally supported, and is normally one of the few things that is shipped with all network stacks. It gives us the ability to confirm that a host is reachable at all, and also a rough estimate on how much latency there is between the system running the utility and the target:
$ ping 184.108.40.206 PING 220.127.116.11 (18.104.22.168) 56(84) bytes of data. 64 bytes from 22.214.171.124: icmp_seq=1 ttl=120 time=3.29 ms 64 bytes from 126.96.36.199: icmp_seq=2 ttl=120 time=2.78 ms 64 bytes from 188.8.131.52: icmp_seq=3 ttl=120 time=3.09 ms 64 bytes from 184.108.40.206: icmp_seq=4 ttl=120 time=3.11 ms ^C --- 220.127.116.11 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 2.789/3.073/3.298/0.186 ms $
Monitoring pings over time is handy too, to spot positive or negative changes in network quality:
The basic mode of operation of a ping is this - an ICMP Type 8 is sent to the target and the host expects a ICMP Type 0 packet to come back. Latency is a measure of the time in between:
A common assumption on this latency number is that you can divide it in half to get the time it takes to send data in one direction (this assumption is sometimes used in “netcode”). While ideally that is true, the internet is full of asymmetric routing, meaning your ping’s path could look something like this:
In this example, it takes 63ms to send a packet, but only 35ms to get it back. To a normal ping, this would just show the latency as 109ms, but if we were to start making one-way latency assumptions based on that 109ms value we would have a decent amount of error (maybe up to 10ms) on how long communications would take to arrive.
So the question is: how do we “split” this ping into its transmit and receive time counterparts?
This whole idea came to me after watching a talk by apenwarr ((pdf)) about Google Fibre WiFi, and in there apenwarr mentions about a tool called isoping. Isoping is a program that does attempt to break up the difference between transmit latency and receive latency.
# ./isoping test.benjojo.co.uk connecting to ::ffff:18.104.22.168... 48.8 ms rx (min=48.8) loss: 0/2 tx 0/1 rx time paradox: backsliding start by -396 usec 48.8 ms tx 48.6 ms rx (min=48.6) loss: 0/3 tx 0/2 rx 48.7 ms tx 49.0 ms rx (min=48.6) loss: 0/4 tx 0/3 rx 48.6 ms tx 48.9 ms rx (min=48.5) loss: 0/5 tx 0/4 rx 48.5 ms tx 49.0 ms rx (min=48.5) loss: 0/6 tx 0/5 rx 48.5 ms tx 48.9 ms rx (min=48.5) loss: 0/7 tx 0/6 rx ^C # ping test.benjojo.co.uk PING test.benjojo.co.uk (22.214.171.124) 56(84) bytes of data. 64 bytes from 126.96.36.199: icmp_seq=1 ttl=54 time=96.6 ms ^C # # 48 + 48 = 96
However, after more testing and talking to apenwarr, I discovered that it’s not capable of detecting long standing asymmetry.
It computes those based on the minimum observed RTT rather than the actual RTT. The minimum RTT is the floor on the precision available. But for detecting why a wifi link occasionally jumps to 500ms, or why your latency spikes when you upload a file, it’s very powerful.
So alas, isoping is well suited for jitter (aka, the issue they were targeting with the Google Fiber WiFi boxes) or detecting directional packet loss. But not against our problem of detecting chronic network asymmetrical routing.
So, again, how do we split the ping?
As we already discussed, an A ICMP ping measures the time between sending and getting a reply. So with no extra meta data, it’s impossible to gleam any asymmetry in the timing between RX and TX.
However, if we were to encode timing information in the packet, like when we sent it and have the other side include when it received it. We can use that to “split” the time it takes for the packet to be sent to the host, and then subtract that time from the total time to get the receive time.
Usefully, ICMP already has this feature in the form of Timestamp Requests.
As you can see in the Wireshark dissection, there is the Originate timestamp for when the host sent the packet, and the receive and transmit timestamps (usually identical) for when the other side transmitted this. I was not aware of any program that could do the half-RTT latency calculation for ICMP Timestamps, so I quickly wrote one:
$ sudo ./icmp-timestap-pinger TS reddit.com (188.8.131.52): Forward: -1ms Back: 3ms RTT(2ms) TS klir.benjojo.co.uk (184.108.40.206): Forward: 7ms Back: 5ms RTT(12ms) TS airmail.benjojo.co.uk (220.127.116.11): Forward: 66ms Back: 65ms RTT(131ms) TS syd-au-ping.vultr.com (18.104.22.168): Forward: 147ms Back: 153ms RTT(300ms)
This presents a problem though, since there is no guarantee that the host you are sending ICMP Timestamp Requests to has an accurate clock, so the value might not be calibrated to the same precision as the machine you are sending from, or even at all. To solve this, we need something that's like this, but with its own time accuracy assurances.
One way to do this would be to assure that all monitored hosts have NTPd/Chrony installed. However, this raises another question - how do you ensure that your IP paths to those NTP servers are not also affected by the asymmetry that you are trying to measure. NTP and Chrony can’t on their own prevent being biased from latency asymmetry. Chrony does have an “Estimation of asymmetric jitter”, but this cannot detect chronic latency asymmetry.
Therefore, to try and reduce the jitter, the idea would be to pick the lowest latency NTP servers, the idea being that there would be less room for asymmetry to play a role in the time estimation.
What are the best NTP servers?
For this I decided to run tests on the popular, and widely available NTP servers:
To benchmark, I set up a uBlox 6M’s Pulse Per Second (PPS) line to be used as a time benchmark, using Linux’s PPS facilities and a Raspberry Pi 4’s (that I got as a speakers gift!) GPIO to trigger the interrupts. We can assume the uBlox’s PPS is nearly perfect, as it’s fed by GNSS sources, meaning it is the most accurate we can easily obtain (uBlox claims that accuracy of the pulse lies somewhere around +/- 100ns).
This means we can log when a pulse happens, and compare it against how close it was to our NTP system time:
root@tempest:~# ppstest /dev/pps0 trying PPS source "/dev/pps0" found PPS source "/dev/pps0" ok, found 1 source(s), now start fetching data... source 0 - assert 1604603306.999999790, sequence: 1403151 source 0 - assert 1604603307.999999554, sequence: 1403152 source 0 - assert 1604603309.000000373, sequence: 1403153 source 0 - assert 1604603310.000000803, sequence: 1403154 source 0 - assert 1604603310.999999807, sequence: 1403155 source 0 - assert 1604603311.999999830, sequence: 1403156
The further away the time stamps are from the exact second, the more inaccurate the NTP source is (likely due to asymmetry or other network factors).
From a VDSL line in the UK on the ISP IDNet , it appears that Cloudflare’s time service provides the lowest error from GNSS calibrated time. But there's another quirk to take into consideration - where Cloudflare uses anycast to route NTP traffic into their network, Apple uses DNS to direct systems to what they think is the nearest time cluster. The upside of this is that all Apple clusters are reachable independently (Singapore, São Paulo Brazil, Hong Kong, San Jose USA, Los Angeles USA, New York USA, Miami USA, Atlanta USA, Sydney Australia, Frankfurt Germany, London UK, Amsterdam Netherlands, Tokyo Japan, Stockholm Sweden, Seoul S.Korea, Taipei Taiwan)
Because of this we can ensure we stay “locked” to an NTP cluster, and so I decided to use this Apple as the gold standard even though it’s error is higher (in my single location test). Facebook and Google time clusters are incompatible with each other in the case of leap seconds due to different implementations of smearing.
Now that we have a synced clock we can get down to finally “splitting the ping”!
First we basically just build a table on both sides, with an incrementing packet ID and a unix timestamp slot, this table holds the ping packets we receive.
Since we swap these tables, we now know the exact time the packet arrived at the other side. Since we have accurate time, we can use these times directly to figure out the directional latency of the packet. On top of that, we can also use it to see what packets didn’t make it due to packet loss. Since we can see what packets we sent, and didn’t get to the other side, we can see packet loss direction too.
Here are some examples I found in the field:
Here I found a “peak hours” latency/jitter episode on a UK residential connection, happening in the inbound direction of the connection. Likely caused by peak hours streaming, etc.
In addition, this (some kind of DSL based) line has quite a significant amount of latency asymmetry:
[xxx.xxx.xxx.xxx] RX: 15.076895ms TX: 5.450551ms [Loss RX: 0/32 | Loss TX 0/32]
Where sending packets to the Internet has a 10ms delay attached compared to receiving them, this is likely some form of DSL Interleaving.
In another experiment I setup sping between my flat and a VM in China, and you can see the infamous daily peaks of packet loss:
Interestingly, this showed that the loss is in the direction of the UK to China, with a small period of China to the UK.
Since this tool is genuinely useful for debugging network latency, and for validating the viability of using NTP for really precise time keeping, I’ve factored it into a generally usable program in the form of sping: https://github.com/benjojo/sping
Usage of sping: -clock-is-perfect Enable userspace calibration against Apple's GPS NTP servers (default true) -debug.showslots Show incoming packet latency slots -debug.showstats Show per ping info, and timestamps -listenAddr string Listening address (default "[::]:6924") -peers string List of IPs that are peers -pps.debug Enable debug output for PPS inputs -pps.path string what PPS device to use (default "/dev/pps0") -udp.pps int max inbound PPS that can be processed at once (default 100) -use.pps If to use a PPS device instead of system clock -web.listen-address string Address on which to expose metrics and web interface (default "[::]:9523") -web.telemetry-path string Path under which to expose metrics. (default "/metrics")
The program can either trust system time, or in environments where it is not possible to set the time (some containers, OpenVZ etc) it can run its own calibration against Apple’s NTP servers, I found those to be the most consistent globally.
Optionally, it can also use a Pulse Per Second device for greater accuracy.
sping uses UDP to exchange ping data between peers, and TCP for the handshake. The IANA port assigned for this is 6924.
Originally posted on Ben's blog March 2021.