In November 2021, Internet users from Mexico lost access to whatsapp.net and facebook.com. By the time the event was reported, the underlying problem had already gone unnoticed for quite some time. Here, we present key takeaways from our analysis of the event - carried out with RIPE Atlas - and we look at the extent to which queries to DNS root servers get manipulated.
DNS is a protocol that translates human-readable domain names (e.g., example.com) into IP addresses (e.g., 18.104.22.168). Unfortunately, DNS is mostly deployed over UDP making it prone to various types of manipulation. In November 2021, users from Mexico could not access whatsapp.net and facebook.com - it turned out those queries were routed towards one of the local instances of the k-root and got intercepted by some middleboxes on the way.
In this blog post, we analyse that event from RIPE Atlas point of view and, more broadly, evaluate the extent of DNS manipulation when sending queries to DNS root servers. We have recently presented our findings at the Passive and Active Measurement Conference (PAM-2023).
Two problems: BGP route leaks and DNS manipulation
Any DNS resolution starts at one of the 13 root servers. To cope with huge query loads, each of those announce two prefixes (IPv4 and IPv6) using BGP anycast but distribute the traffic among more than 1.6k individual instances located worldwide. Some of the local instances are only meant to serve a limited number of clients (e.g., a single ISP or a country) and their routes should not propagate to the entire Internet. This is usually achieved with NO_EXPORT or NOPEER BGP community attributes (see RFC 4786 for further details).
Yet, sporadic route leaks may advertise local root server instances worldwide. Such events would stay transparent to end users, unless clients experienced difficulties reaching certain domain names. For example, when the Beijing-located i-root instance leaked in 2010, end users had received bogus responses for twitter.com, facebook.com, and youtube.com. Similarly in 2021, the k-root route leak diverged DNS queries from Mexico to Guangzhou and triggered response injection for whatsapp.net and facebook.com. As root server operators do not serve bogus data, some middleboxes must have intercepted user requests and injected responses.
We were wondering how many end users were impacted by the November 2021 route leak and for how long it stayed unnoticeable. More broadly, in this blog post, we analyse to what extent queries sent to DNS root servers are getting manipulated, even when no route leaks or other anomalies occur.
Takeaway #1: Root server route leaks may stay unnoticed
We used one of the built-in RIPE Atlas measurements to verify how far the November 2021 route leak propagated. As it turned out, the Guangzhou-based local k-root instance was reachable outside the country at least 2 months before being reported - 57 RIPE Atlas probes located in 15 countries (AU, UA, CO, HK, LK, CH, FR, US, KR, DK, MX, ZM, BE, GB, NP, KE) had their DNS queries routed towards that local k-root. Even after being fixed, 12 (RU, IL, MX, DK, HK) probes would occasionally reach the Guangzhou instance in the following 9 months.
Takeaway #2: DNS manipulation is persistent and omnipresent
We set up a series of 312 non-recursive DNS measurements towards all the root server letters with alternating IP versions, transport protocols, query types, and domain names (see Fig. 1 below). These were run twice per day from all the connected RIPE Atlas probes (about 11k).
We then divided measurement results into two broad categories - (i) non-injected, if the answer section of the response was empty and (ii) injected, if we got responses to our queries. We recall that DNS root servers are not authoritative for any of the queried domain names, so we only expect to see referrals to .com/.net TLD nameservers and the corresponding glue records.
As shown in Figure 2, between 3% and 4% of RIPE Atlas probes per week receive injected DNS responses when communicating with root servers. The overall ratio of manipulated measurements does not exceed 1%. At the same time, roughly 20% of participating RIPE Atlas probes experienced response manipulation during all 9 months of the experiment (see Fig. 3).
Takeaway #3: Inserted responses are not always bogus
We received more than 11 million individual injected responses of 5 different resource record types: A (7 million), AAAA (4m), URI (43k), SOA (7k), and CNAME (5k). Interestingly, those were not always bogus. For example, 49% of facebook.com and 89.6% of google.com responses contained valid A records belonging to requested domain names. The ratio of valid AAAA responses was even higher - 64.4% for facebook.com and 98.3% for google.com. All the CNAMEs were pointing google.com to forcesafesearch.google.com - Google's service to remove explicit content from search results. Only URI and SOA responses would completely prevent the access to requested domains, as those did not contain any valid address.
Several entities are behind those injected responses - apart from national censors, we might have encountered DNS filtering services, transparent forwarders that relay DNS requests to alternate resolvers, or DNS servers that serve the root zone locally. Therefore, corporate policies may not allow end users to contact arbitrary DNS servers.
Overall, the response injection affected 7% of all the 14.3k RIPE Atlas probes tested in February - October 2022. The problem could further be exacerbated in case of route leaks, as clients from the outside could also be affected. Some of the countermeasures below will help minimise the risks or avoid the manipulation altogether:
- BGP communities - one can uniquely identify anycast instances using BGP communities, e.g., by encoding geographical coordinates in BGP announcements. Routers at the destination networks would then choose the closest instance.
- QNAME minimisation - it is not necessary to reveal the full domain name when querying DNS root servers, especially that it might trigger middleboxes.
- Encrypted DNS - will prevent from sniffing plaintext DNS traffic, but needs to be deployed on the whole resolution chain (end clients to resolvers and resolvers to authoritative nameservers).
- DNSSEC - validating resolvers will reject bogus responses, provided domain names in question are also signed.