Find out how we used RIPE Atlas to test and fix reachability to 184.108.40.206
Recently we announced our fast, privacy-centric DNS resolver 220.127.116.11, supported by our global network. As you can see 18.104.22.168 is very easy to remember, which is both a blessing and a curse. In the time leading up to the announcement of the resolver service we began testing reachability to 22.214.171.124, primarily using the RIPE Atlas probes. The RIPE Atlas project is an extensive collection of small monitoring devices hosted by the public around the world. Currently there are over 10,000 active probes hosted in over 3,000 networks, giving great vantage points for testing. We found large numbers of probes unable to query 126.96.36.199, but successfully able to query 188.8.131.52 in almost all cases. 184.108.40.206 is the secondary address we have assigned for the resolver, to allow clients who are unable to reach 220.127.116.11 to be able to make DNS queries.
This blog focuses on IPv4. We provide four IPs (two for each IP address family) in order to provide a path toward the DNS resolver independent of IPv4 or IPv6 reachability.
18.104.22.168/8 was assigned in 2010 to APNIC, before this time it was unassigned space.
% IANA WHOIS server % for more information on IANA, visit http://www.iana.org % This query returned 1 object inetnum: 22.214.171.124 - 126.96.36.199 organisation: APNIC status: ALLOCATED whois: whois.apnic.net changed: 2010-01 source: IANA https://www.iana.org/whois?q=188.8.131.52%2F8
Unassigned, however is not the same as reserved for private use!
What we found
To put it simply, 184.108.40.206 was BROKEN! The good news is, for most users 220.127.116.11 is now reachable. We’ve worked hard to ensure that issues get resolved and continue to contact operators to resolve issues quickly. We’re confident we can get everything cleaned up, but this is a stark reminder that you shouldn’t hijack IP addresses not assigned to you. We found over 1,000 probes out of just over 10,000 were unable to make DNS queries to 18.104.22.168 successfully. Some of this was due to single large networks having reachability issues, for example a large operator in Germany has nearly 350 probes connected, all of them failing. The methodology for testing was very simple:
- Run a DNS lookup measurement towards 22.214.171.124
- Find probes where the lookup fails
- Run a traceroute with affected probes
- Analyse the result
The results were quite mixed, but fell into three main causes:
- Built-in 126.96.36.199 ISP routers using 188.8.131.52 as an internal IP address, preventing queries from reaching the real 184.108.40.206
- Blackholing 220.127.116.11 ISPs statically configuring a route for 18.104.22.168 inside their networks, preventing traffic leaving their network, either through routing internally, or by sending the packets to null0
- Filtering 22.214.171.124 ISPs dropping packets on ingress or egress to/from their network when sourced/destined from/to 126.96.36.199
Of these three main causes, the majority of cases were either 1, 2 or both! Several ISPs even had route loops internally, where they were advertising 188.8.131.52 inside their network, but had no actual path to it, so packets loop around and around.
Time to get fixing
Once we had narrowed it down for each group of probes we began contacting ISPs for clarification on what was happening, several networks responded very quickly reporting they had removed an internal route to 184.108.40.206, which in most cases was the beginning and end of the matter. There were plenty of networks which took their time to respond, but in the end did the same, removing the internal routes left there for legacy testing reasons.
All of those fixes were great to make, but most were quite uninteresting, what was more interesting was finding cases from issue number 1, CPEs (customer premises equipment) aka home routers, gateways and wireless access points. With the help from the folks at Sonic we were quickly able to identify that the Pace 5268 a common xDSL modem deployed primarily in the United States (including wide usage on AT&T) uses 220.127.116.11/29 for internal communication. We requested comment from AT&T’s noc, but have not heard anything from them. We did however receive a response from them via social media:
Independent investigation confirms the findings:
The same finding was made with the D-Link DMG-6661, which was reported to us by a user from Brazil connected to Vivo FTTH.
Another user in Argentina connected to Telefónica found the issue on the Mitrastar GPT-2541GNAC.
It appears this CPE has been deployed in many of Telefónica's networks internationally.
We noticed similar behaviour on a large portion of probes connected to Orange France, we contacted them and received a swift response that the CPE team was investigating the issue. After providing more details they came back to us with a statement.
We have escalated your alert inside ORANGE France.
Our CPE team has analyzed the issue which is now well understood. This problem only impacts a subset of our CPEs. The commitment of the fix is currently on going and a deployment on our CPEs will follow.
In case of complaints notified by our customers in the meantime we have prepared a communication to inform them that we are currently fixing the issue.
The CPE in question is the Livebox, although it's not clear which versions are affected, it should be resolved by Orange across all affected devices. Users in Poland reported the same issues as users in France, likely due to Orange deploying the same models across multiple networks.
By far my favourite response was from the friendly folks at Telenor:
I have corrected all routers in our network now that had an awful old solution that is now obsolete. Thank you Cloudflare for the help to get it done!
Obsolescence is inevitable, but the desire to speedily fix such occurrences is great to see.
These are by no means all devices that have issues, but some of the wider deployed ones. The current list we have of affected devices is:
- Pace (Arris) 5268
- D-Link DMG-6661
- Technicolor C2100 Series
- Mitrastar GPT-2541GNAC
- Askey RTF3507VW-N1
- Calix GigaCenter
- Nomadix (model(s) unknown)
- Xerox Phaser multi-function printer
- See below :)
If you have a device that is affected, please let us know in the comments. A good example of this is a super-low latency with only 1 hop to 18.104.22.168:
Traceroute to 22.214.171.124 (126.96.36.199), 48 byte packets 1 188.8.131.52 1dot1dot1dot1.cloudflare-dns.com AS13335 8.301ms 1.879ms 1.836ms
Who else has been mis-using 184.108.40.206 more than others?
Using the RIPE Atlas probes gives excellent visibility into residential and business internet connections, however they’re connected via a cable, so this rules out another use-case, WiFi access points. After very little research we quickly came across Cisco mis-using 220.127.116.11, a quick search for “cisco 18.104.22.168” brought up numerous articles where Cisco are squatting on 22.214.171.124 for their Wireless LAN Controllers (WLC). It’s unclear if Cisco officially regards 126.96.36.199/8 as bogon space, but there are lots of examples that can be found on their community websites giving example bogon lists that include the /8. It mostly seems to be used for captive portal when authenticating to the wireless access point, often found in hotels, cafés and other public WiFi hotspot locations.
Here are some interesting statistics from before we started contacting operators and after we have fixed many issues. It’s staggering to see how fixing some key networks increased availability by almost 20% in Europe & North America!
We began testing the availability of 188.8.131.52 on the 23rd of March, in Europe and North America it was only around 91%.
184.108.40.206 availability from Europe and North America, 23rd of March
By the 3rd of April, our work cleaning up the space had pushed the availability up to 97%.
220.127.116.11 availability from Europe and North America, 3rd of April
For the rest of the world, excluding Europe and North America, availability to 18.104.22.168 was only 73% on the 23d of March.
22.214.171.124 availability for the World (Europe and North America excluded), 23rd of March
By the 3rd of April we've made a tonne of progress and managed to clean up enough of the bad routing that availability was up to 85%.
126.96.36.199 availability for the World (Europe and North America excluded), 3rd of April
We are continuing to work with ISPs and CPE manufacturers to clean up bad routing globally. Our goal is for 188.8.131.52 to be properly routed and available for 100% of Internet users.
Above images from Catchpoint analytics
The last public analysis was done in 2010 by RIPE and APNIC. At the time, 184.108.40.206/24 was 100 to 200Mb/s of traffic, most of it being audio traffic. In March, when Cloudflare announced 220.127.116.11/24 and 18.104.22.168/24, ~10Gbps of unsolicited background traffic appeared on our interfaces.
The most targeted IPs were 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206. When searching these IPs, we notice it is usually misconfiguration or hardcoded IPs.
The destination port is usually 80/443, but also other variants of HTTP ports (8000, 8080, et al.) using UDP and TCP, seemingly trying to setup a proxy connection. Some of the traffic was also DHCP/BOOTP, iperf and syslog.
Some IPs are also the target of DDoS attacks (even before we announced the new service publicly). Analysing by source port we saw NTP and memcached, usually reaching 5Gbps for a few minutes. The short duration of the attack shows that could be a hardcoded IP in a botnet before it starts sending traffic to a specific target.
We also noticed daily patterns where 4 IPs receive the same amount of traffic (±50Mbps).
All of this bad traffic is unrelated to DNS, simple, unsolicited background traffic.
It was clear from the start that we’d have our work cut out, especially with CPE vendors, where a firmware update would be required. What was impressive was the willingness of operators to collaborate with us to clean up the legacy misconfiguration. It was clear that 220.127.116.11 needed a lot of cleaning in order to be globally accessible. We decided six months before the 1st of April release date we'd commit the network and support resources to that task. Now that 18.104.22.168 is live, we’re thankful to all the networks and hardware companies who have assisted us in this effort. We’re not done, nor are others.
The RIPE Atlas project was immensely useful in testing reachability from as many networks around the world as possible. If you’d like to help the project, please consider hosting a probe. Some networks are not covered with at least one probe, you can see if your ISP has a probe here, sorted by country.
Particular thanks to the following operators who were responsive and helped clean up issues quickly.
- LG Telecom
- Liquid Telecom
- Telecom Italia
- Turk Telekom
We still have work to do, contacting operators that we see issues with, in the meantime you should be able to use our second IP address of 22.214.171.124, which has far fewer issues. Don't forget, both of our IPv6 addresses too: 2606:4700:4700::1001 and 2606:4700:4700::1111.
Do you still have reachability issues to 126.96.36.199? You can find more information at our community forum. We’d also recommend reporting such issues to your ISP, they may already be aware of issues, or they may need you to report it to them to start investigating. Whichever is true, making them aware is especially helpful, operators are not always receptive to reports from external parties.
This was originally posted on the Cloudflare blog.