How much of a problem are large DNS responses over UDP in the wild?
The Domain Name System (DNS) provides one of the core services of the Internet. DNS employs both UDP and TCP as transport protocols, but most responses are sent over UDP given that it's fast at one Round Trip Time (1 RTT). However, UDP is not always suitable to deliver large DNS responses as packets can be dropped and fragmented. There is a risk that clients will not receive the answers, which can lead to unreachability.
To determine how serious the problem of large messages in DNS can be, we analysed 164 billion DNS queries and responses, collecting three full months of data (July 2019, July 2020, and July 2021), at the authoritative servers of the Netherlands’ .nl ccTLD. This article presents the main results of the paper we published at the Passive and Active Measurements Conference 2021 (PAM 2021).
Nobody really likes to wait for a page to be loaded on the Internet, and the DNS can be one of causes of slow load times. Domain names need to be resolved before pages can be loaded. Faster responses are obtained with DNS over UDP (DNS/UDP), which require one round-trip time (RTT). However, given UDP provides no delivery guarantees by design, DNS can also be used with TCP (DNS/TCP), which takes two RTT to retrieve the same responses (TCP requires an extra RTT due to its handshake).
DNS/UDP is faster than DNS/TCP, but it has a tough time handling large messages. The original DNS specification limited UDP messages to 512 bytes, which proved insufficient in many cases, so in 1999 EDNS0 was proposed, allowing the extension of UDP message sizes up to 64k bytes.
With ENDS0, DNS clients (resolvers) can advertise their UDP buffer to the authoritative servers, which would use that value as an upper limit when sending responses. If a response was larger than the EDNS0 buffer advertised by the client, then the authoritative server would truncate it and mark it **TC bit**, so the resolver would use that signal to request the query again, using DNS/TCP.
The issue was that all of this was done at the application layer, which is agnostic to the networking layer. In other words, these buffer negotiations did not consider the maximum transmission unit (MTU) of the path between client and authoritative server — and the most common MTU at the core of the Internet is 1500 bytes. If DNS responses were larger than the path MTU, then these packets would be simply fragmented or discarded along the way. IPv4 fragmentation is so poorly designed that nowadays is considered fragile and should be avoided. The worst case is when responses are silently discarded, and clients never receive a DNS response, which effectively blocks them from reaching their desired URL.
While other work has investigated this issue, we look at it from a different vantage point: two anycast authoritative servers of the Netherlands's ccTLD (.nl). We analyse 164 billion queries, collected with our DNS big data analysis tool ENTRADA, as shown in the table below:
|UDP TC off||27.80B||7.24B||42.06B|
|UDP TC on||0.87B||0.31B||1.69B|
|UDP TC off||3.09M||0.35M||2.99M|
|UDP TC on||0.61M||0.08M||0.85M|
|UDP TC off||44.8k||8.3k||45.6k|
|UDP TC on||23.3k||4.5k||27.6k|
We collected data from two .nl anycast authoritative servers, NS1 and NS3. Although each is run by a different anycast providers, they’re combined in Table 1. We took yearly snapshots in July 2019, July 2020, and October 2020, the first month after DNS Flag Day 2020.
From our vantage points, we see that a small fraction of responses is truncated: 2.93 percent to 7.15 percent, depending on the year and IP version. This is the starting point of our analysis.
Large Responses are Rare
The first analysis we did was to calculate the distribution of the response sizes the servers sent. As indicated by the vertical dashed line in Figure 1, 99.99% of the responses from the .nl servers are smaller than 1,232 bytes, which is the size proposed by the DNS Flag Day 2020. One could argue that data is only valid for the .nl zone, but Google Public DNS, the largest public resolver service on the Internet, reports that 99.7% of their traffic is also smaller than 1232 bytes.
Contrary to what we expected, the largest responses were for A and AAAA records of the .nl authoritative servers, not DNSSEC records. The size of the responses also changed per server: NS1 is configured to return minimal responses, while NS3 is not. Therefore, minimum responses effectively prevent extra records being added in the additional section, reducing the message response size.
Fragmentation Rarely Occurs on the Server Side
IP fragmentation can happen on the server side, but only for IPv4 as IPv6 forbids in-network fragmentation. For each server and IP version, we analysed the number of fragmented responses sent by our servers. As shown in Figure 2, relatively few responses were fragmented at less than 10,000 per day, but we saw 1-2 billion daily queries by comparison, in Table 1. In the paper, we discussed an active measurement with RIPE Atlas to measure in-network fragmentation and found that 4.4% of queries are fragmented at the network level in the wild over IPv4.
Small EDNS0 Buffers Lead to Truncation, Larger Ones Don't Prevent It
The data in Table 1 shows that 2.93 to 7.15% of the UDP responses were truncated, so we investigated. Figure 3 shows the Cumulative Distribution Function (CDF) of both response sizes and EDNS0 buffer sizes for NS1. We see that most DNS/UDP queries are truncated to values under 512 bytes, independent of the IP version.
In Figure 3, the left dashed vertical line also shows us that most buffer sizes are equal to 512 bytes, which is rather small. Oddly, the purple line for IPv4 shows that NS1 receives 13% of its queries without EDNS0 extension. We found that this was from two ASes, which have an odd behaviour and only query NS1 (sticky resolvers).**
So, when a resolver receives a truncated response, it should ask the same query again using DNS/TCP. We found that this happens 80% of the cases, as shown in Figure 4.
Direct DNS Flag Day 2020 Uptake was Rather Small, but Operators Adapt Slowly
The DNS flag day 2020 proposed resolver operators to configure their EDNS0 buffer sizes to 1,232 bytes. That, in turn, would reduce the large buffer sizes we see in Figure 5, and avoid both fragmentation and truncation. When we use the October 2020 dataset and compare it against the July 2020 to measure the uptake of DNS Flag Day, all resolvers are seen on both datasets, and we see how many have migrated to EDNS 1,232 bytes.
From 1.85 million resolvers (unique IP addresses), see only 11,338 that adopted 1,232 bytes compared to July 2020, suggesting that Flag Day didn’t cause operators to change their settings immediately.
We also investigated the daily distribution over a 1.5-year period, as shown in Figure 5. By the end of May 2021, we saw 9% of the resolvers announcing 1,232 bytes — twice as many as one year earlier. However, the majority still announces either 4,096 bytes or other values.
Falling to Bits?
This study complements previous ones on fragmentation and truncation on DNS. While rather rare, large responses exist in DNS, and they can be prevented by the increased adoption of smaller buffer sizes. Server-side fragmentation is very rare, for both IPv4 and IPv6, but in-network fragmentation is still present at 4.4% for IPv4.
The DNS Flag Day 2020 had some impact, but DNS operators adopted its recommendations only slowly.
This blog is based on a peer-reviewed paper, which can be downloaded here.