Recently, we spotted a network bug that had precisely the opposite of the intended effect.
In Jan's recent article on NAT64 testing, his measurements identified a fun network bug on our web front-end. Our policy for ripe.net is that we drop or disallow echo requests depending on the target*.
So, a small puzzle: go take a look at the third image in Jan's article. All of those pings should be 'lost', but some aren't. So what's going on? By now, ops at RIPE NCC have modified the network configuration, so you won't be able to reproduce this.
Think about this for a few seconds before reading on. Why would large packets get a response, but not small packets?
Let's take a look at what the network was doing at the time. Here's what a normal ICMP traceroute to the target looks like:
sds@tiree:~$ sudo traceroute6 -Inq 1 -f3 ripe.net
traceroute to ripe.net (2001:67c:2e8:22::c100:68b), 30 hops max, 80 byte packets
3 2001:7f8:13::a500:1103:1 1.436 ms
4 2001:7f8:1::a500:6939:1 34.214 ms
5 2001:7f8:1::a500:3333:1 2.035 ms !X
The "!X" indicates that the target is "administratively prohibited" via this router. So far, so correct.
The unexpected result in Jan's tests should give you the clue that our good friend packet fragmentation is the culprit. So let's see what happened when we sent fragments to the same target. Ping has a '-s' option that allows you to set the number of bytes of payload following the ICMP header. Sending a large payload that triggers fragmentation gets us a response:
sds@tiree:~$ ping6 -nc2 -s1453 ripe.net
PING ripe.net(2001:67c:2e8:22::c100:68b) 1453 data bytes
1461 bytes from 2001:67c:2e8:22::c100:68b: icmp_seq=1 ttl=59 time=1.04 ms
1461 bytes from 2001:67c:2e8:22::c100:68b: icmp_seq=2 ttl=59 time=1.70 ms
--- ripe.net ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.040/1.372/1.704/0.332 ms
And traceroute confirms that setting a larger payload lets the traffic right through to the target:
sds@tiree:~$ sudo traceroute6 -Inq 1 -f3 ripe.net 1501
traceroute to ripe.net (2001:67c:2e8:22::c100:68b), 30 hops max, 1501 byte packets
3 2001:7f8:13::a500:1103:1 1.335 ms
4 2001:7f8:1::a500:3333:1 1.352 ms
5 2001:67c:2e8:22::c100:68b 1.869 ms
In this case, the router that was previously telling us it wouldn't forward our echo requests does forward these echo requests. To be clear on why that's the case, here's the real trick: the fragments slip through because the ICMP header is buried deeper within the packet than the router is inspecting.
To illustrate, here's a "normal" packet header:
And here's a fragmented packet header:
So when the firewall or router inspected the headers, it didn't see ICMP (type 58); it saw an IPv6 fragment (type 44) and, in this case, waved it through. Having done so, the target received the fragments, reassembled the payload, and responded to the echo request appropriately. The end result? For a while, pings with large packets to ripe.net worked!
Fragmentation often catches people out, because it's unexpected or unintuitive to many. Folks may configure their network to drop or accept ICMP, or TCP port 80, or whatever, but they'll completely forget about fragmented traffic. In cases where folks are only serving content over TCP, it's acceptable to simply drop all fragments and avoid the headache. But as we know, there are places such as DNS where we probably have to endure the headache.
* (there is a longer context about load balancers not reliably responding to ICMP echo requests, so rather than provide meaningless results the decision was made to provide none at all...)