As part of the user interface to the RIPE Atlas network, users can see the first and second hops seen by certain probes. By looking for private addresses in these hops, we can learn something about how NATs are used in the Internet – in particular, that there's quite a lot of them!
The RIPE Atlas network is a measurement network operated by the RIPE NCC. It is comprised of many small physical devices (called "probes") that volunteer "probe hosts" connect to various networks. I've been a proud probe host since I got my first two probes at RIPE 60, and I've been asking the RIPE NCC staff for a while to open up greater access to ATLAS data.
A few months ago, the Atlas team took an important step in that direction by allowing probe hosts to designate their probes as "public". Once a probe is marked as public, other probe hosts can see certain information, such as its name, location, uptime and latency graphs. Currently over 480 probes are marked as public.
Information pages for public probes also list the first two IP addresses that show up in a traceroute performed by the probe. One thing that's interesting about these first-hop and second-hop IP addresses is that many of them are private, RFC 1918 addresses. The use of private addresses of course, implies that there is some sort of Network Address Translation (NAT) device between the probe and the Internet. So we can learn about how people use NATs by looking at the first- and second-hop addresses that probes see.
RFC 1918 defines three different prefixes for private use: 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16. The first question we can ask of the Atlas data set is how commonly each of these subnets gets used.
|Network||% of 1st hops||% of second hops|
In fact, one single address (192.168.1.1) is a first hop for 10.4% of the public probes. Presumably this is because many home routers are configured to perform NAT by default, using addresses within 192.168.1.0/24, and not because of a single giant router! No second-hop router had anything close to this level of popularity.
Taking this a step further, we can look at a "transfer table" describing how often packets go from one class of address on the first hop to another class on the second hop. We can group these transfers into five broad classes, in increasing order of "badness":
1. All Public (42.6%): Both first and second hops are public (i.e., not RFC 1918 space)
2. Private to public (39.2%): The first hop is private, but the second is not.
3. Multiple hops within NAT domain (6.2%): Both hops are private, within the same RFC 1918 prefix.
4. Hops between NAT domains (5.3%): Both hops are private, but in different RFC 1918 prefixes.
5. Public to private (2.8%): First hop is public, but second hop is private.
The first two of these classes are the ones we would definitely expect to see. Obviously, having direct access to the Internet is the best case. Connecting through one NAT is second-best, and a very common case for residential users that connect through a "home gateway" such as a cable, DSL, or 3G modem. Having multiple hops within a NAT domain is only a little way from the "private to public" case, and is mainly an indication of the size of the NAT domain.
Cases (3) and (4) are more unusual. Hopping between NAT domains very likely indicates that the probe has to traverse two NATs before reaching the Internet, far from optimal connectivity. Hopping from public to private address space seems strange on the face of it, and it's unclear what the implication is. This situation could arise from double-NATting, with public addresses on the inside, but it might also happen in certain cases where an ISP uses private addresses for some router interfaces.
The main thing to take away from these results is that while the more standard cases (1) and (2) are prevalent, they are not completely dominant; they cover only around 82% of probes. The more striking number is that around 8% of probes are in categories (3) and (4), which both likely require that there be two NATs between the probe and the Internet. There's obviously some sampling bias in these numbers, since Atlas probes aren't everywhere. But given that probe hosts are by and large technical people, one might suppose that the real Internet is no *better* than these numbers indicate. The Internet is always weirder than you expect it to be!
Comments are disabled on articles published more than a year ago. If you'd like to inform us of any issues, please reach out to us via the contact form here.
Some providers use RFC1918 addresses in their infrastructure, so having a second RFC1918 address in the second hop doesn't necessarily mean there are two NATs. I'm not sure if there's a way to differentiate them, though..
Hide one reply
Vaibhav Bajpai •
This is interesting, can you point us to a reference that shows providers use RFC 1918 addresses in their access network?
what does this say about where most of the probes are located? in natted end sites? in ripe ops geeks' homes or natted home offices?
Stuart: Good point. It is possible that part of the hops-between-domains case is actually like the multiple-within-a-domain case, in that there are actually real routes for RFC1918 space within a network, as opposed to NATs. However, it's also easy to imagine scenarios where probes in home networks end up behind two gateways; that's how I had probe 394 set up for a long time.<br /><br />Randy: Hard to tell. Looking at the second hops for the 49 probes that have 192.168.1.1 as a first hop, we see that most of them have public second hop:<br /> 7 10.185.0.1 <br /> 1 192.168.200.1<br /> 1 172.17.49.241<br /> 40 distinct public addresses<br />The seven going to 10.185.0.1 are in apparently distinct networks, with registered locations in the US, EU, and Russia. Net of all that, my guess would be that these probes are behind small "home gateway" style NAT devices, either within home networks or other end sites.
Presumably the RIPE NCC has contact information for every Atlas owner... can't you maybe send an e-mail to try to figure out what is going on with public to private hops by, well, asking?<br /><br />I know that in general on the Internet it's hard to figure out strange network configurations, but this one should be straightforward. And I'm dying to know. :)
Thanks for sharing your analysis, Richard.<br /><br />Just as interesting as the kind of IPv4 source addresses used in probes, I think it would be worth also taking a look at the v6 counterpart. I wonder which kind of IPv6 addresses are being used in the probes (native, tunneled, and in that case, which tunnelbroker is most popular).<br /><br />If you have some time to spare, perhaps you could perform some research on this matter, as you have access to bulk data of public probes :)<br /><br />Thanks in advance
We can also ask the people to tag their probes for the type of connection they use: home, business, academia, etc. when registering the probe. This could help in further analysis.
Shane: In principle, we could ask people about all the options. However, we would have to get someone from the NCC to do it, since the probe owners' contact info isn't public. See Robert's comment for one suggestion. <br /><br />Iñigo: Actually, the source addresses of the probes are not made accessible through the public data, either in v4 or v6. I was looking at the first and second traceroute hops, which are presumably not the address of the host itself. I would be glad to do a similar analysis on IPv6 hops, but it looks like the probes only collect 1st/2nd-hop addresses for IPv4 (traceroute) and not IPv6 (traceroute6).
Hide one reply
Indeed, that's correct. Public probe data does not include source addresses and the analysis performed is on 1st and 2nd hop addresses.<br /><br />My intention was to say that the kind of NAT the probe uses, inferred from the 1st and 2nd hop addresses, can also provide information about which kind of source address the probe has, and so taking a look at the IPv6 equivalent would also be very interesting :-)<br /><br />The rationale for this look at the data stems from Emile's observations (<a href="http://labs.ripe.net/Members/emileaben/measuring-world-ipv6-day-comparing-ipv4-and-ipv6-performance" rel="nofollow">http://labs.ripe.net/Member[…]g-ipv4-and-ipv6-performance</a>) on performance. I am curious about the impact of v6 tunnels and slicing the data by this criteria (native vs tunneling) would provide insight on it.<br /><br />As for the data availability, I understood "official" researchers had unrestricted access to the probe data and permission to use it, which would include source addreses. Is this correct?