You are here: Home > Publications > RIPE Labs > Austin Hounsel > Analysing the Costs (and Benefits) of DNS, DoT, and DoH for the Modern Web

Analysing the Costs (and Benefits) of DNS, DoT, and DoH for the Modern Web

Austin Hounsel — 28 Oct 2019
All Internet communication relies on the Domain Name System (DNS), which maps a human-readable name to an IP address before two endpoints establish a connection to exchange data.

Most DNS queries and responses are transmitted in cleartext through conventional DNS (also referred to as Do53), making them vulnerable to eavesdroppers and traffic analysis — past work has demonstrated that DNS queries can reveal sensitive information, such as browsing a user's activity in a smart home.

To mitigate some of these privacy risks, two protocols have been proposed: DNS-over-TLS (DoT, RFC7858) and, more recently, DNS-over-HTTPS (DoH, RFC8484). Rather than sending queries and responses as cleartext, these protocols establish encrypted tunnels between clients and resolvers. This fundamental change has implications for the performance of DNS, as well as for content delivery.

Methodology

Our team of researchers at Princeton University sought to measure how encrypted transports for DNS affect end-user experience in web browsers. Specifically, we measured DNS response times and page-load times across different resolvers (Cloudflare, Google, Quad9, and local resolvers provided over DHCP), websites, and network conditions for two weeks.

Our measurements were performed from five global vantage points on Amazon EC2: Ohio and California (United States of America), Frankfurt (Germany), Seoul (South Korea), and Sydney (Australia). For brevity, we focus on our measurements conducted from Frankfurt.

From our measurements, we found that TCP enables DoH and DoT to perform comparably to and sometimes better than Do53 in page-load times, despite higher response times. This is particularly the case when there is some small amount of packet loss.

Measuring Response Times

To measure DNS response times, we control Mozilla Firefox through Selenium to visit websites from the Tranco top-list . We visited the top 1,000 websites and the websites ranked 99,000 to 100,000, which represent highly optimised website that are likely to be hosted on global content delivery networks, and less optimised but still relevant websites.

We then inspect HTTP Archive objects (HARs) after each page load, which we extracted from Mozilla Firefox through a custom extension. These HARs contain all of the unique domain names for resources on a web page. Finally, we measure DNS resolution times for each domain name through a tool we built, which uses getdns and libcurl.

For DoT requests, we enabled connection reuse with an idle timeout of 10 seconds in order to amortize the TCP handshake and TLS connection setup. Although Firefox does not currently support DoT within the browser, we believe this is a realistic setting, as it is the default timeout used by DoT stub resolvers, like Stubby. For DoH requests, we also enabled connection reuse, and we sent requests over HTTP/2, which is the recommended minimum HTTP version for DoH, and which Firefox uses.

Figure 1: DNS Response Times for Cloudflare, Google, and Quad9 from Frankfurt

Figure 1 shows CDFs for DNS resolution times from Frankfurt for the top 1,000 websites and the top 99,000-100,000 websites combined. We find that Do53 performs better than DoT and DoH on for most lookups across all resolvers. The additional overhead introduced by encrypted transports for DoT and DoH leads to an increase in resolution time. Interestingly, we find that each provider’s DoH resolver is slightly faster than their Do53 resolver for the slowest queries. We believe this can be explained by differences in caching.

Measuring Page-load Times with Cloudflare’s Resolver

To understand how different DNS protocols affect users’ experience, we measured page-load times. We focus on Cloudflare’s resolver because it performed the best across different resolvers. We also emulated several different network conditions to observe how these protocols perform.

We obtained page-load times by inspecting HARs after visiting each website. Importantly, HARs include timings for the onLoad event, which triggers when a browser finishes loading and rendering a page, as well as timings for individual components for each request that the browser made.

Figure 2: Comparison of page load times between protocols using Cloudflare’s resolver on Amazon’s EC2 network in Frankfurt. Measurements were performed with default network settings and with emulated 4G settings.

Figure 2 compares page-load times between Amazon’s EC2 network with default network settings and with emulated 4G settings. Each plot shows a CDF for the difference in page-load times between the same domain for two protocols on a given network.

The vertical line on each plot indicates the median for the CDF. A median that is less than 0 sec on the x-axis means that the configuration (recursive resolver, protocol) specified by the first half of the graphs is faster than the configuration specified by the right half (indicated in blue hues in Figure 3). Correspondingly, a median that is greater than 0 sec on the x-axis means that the configuration specified by the first half of the graphs is slower than the configuration specified by the right half (indicated in red hues).

On the default network, the median page load with Cloudflare DoT is < 1ms slower than Do53. Furthermore, the median page load with Cloudflare DoH is 16ms slower than Do53. These results stand in contrast to our naïve expectation that page-load times for DoT and DoH would be much slower than Do53 due to additional latency for individual requests. We believe this can be attributed to Firefox using asynchronous queries for DoH, and to Stubby re-using TLS connections for DoT.

We found that on the emulated 4G network, the median page load with Cloudflare DoT is 6ms faster than Do53. Cloudflare DoH still performs worse than Do53, with a median page load that is 66ms slower.

Figure 3: Comparison of page-load times between protocols using Cloudflare’s resolver on an emulated, lossy 4G network and a 3G network

Figure 3 compares page load times across an emulated, lossy 4G network (on the left) and a 3G network (on the right). On the emulated, lossy 4G network, the median page load with Cloudflare DoT performs 101ms faster than Do53. Similarly, the median page load with Cloudflare DoH performs 45ms faster than Do53.

However, as throughput decreases and loss increases on the emulated 3G network, Cloudflare DoT and DoH are no longer able to perform as well. The median page load with Cloudflare DoT performs 137ms slower than Do53. Even worse, the median page load with Cloudflare DoH performs 328ms slower than Do53.

Transport Protocols Greatly Affect Performance

We believe that TCP enables DoH and DoT to perform comparably to and sometimes better than Do53 in page-load times, despite higher response times. This is particularly the case when there is some small amount of packet loss.

For example, the default timeout for Do53 requests in Linux is set to 5 seconds by /etc/resolv.conf, which is the earliest time after which a failed DNS query will be retransmitted. However, depending on the TCP configuration, DoT and DoH packets may be automatically retransmitted after 2x the round-trip-time latency to the recursive resolver. Thus, DoT and DoH may be able to more quickly recover lost DNS queries that block rendering a page than Do53.

Considering the ubiquity of the web and the multitude of DNS queries that a single website may require, we need to further explore how different transport protocols affect DNS performance, and, in turn, page-load times and user experience.

Of course, sending DNS packets over TCP is not a new concept, as RFC1035 described in 1987. However, if major browser vendors enable DoH or DoT by default, then all DNS traffic for hundreds of millions of users will be sent over TCP.

This may have profound and unforeseen performance impacts. As such, we stand in front of an enormous opportunity to improve user experience by studying how DNS performs over different transport protocols at scale.

Next steps

In future work, we plan to scale up our measurements across a diverse set of networks in the United States. Specifically, we want to measure how Do53, DoT, and DoH perform in numerous residential ISPs. We also plan to measure how these protocols perform on networks that are further away from CDNs, such as in Africa. Lastly, we plan to measure how Do53 performs over TCP, rather than UDP.

This research was presented at RIPE 79 during the DNS WG session.

1 Comment

DNS-optimizer says:
28 Oct, 2019 09:52 PM
Hi,
I can hardly believe that an UDP-based anycast service could be slower than any 3-way handshake TCP connection + TLS/SSL encryption layer ontop.
Add comment

You can add a comment by filling out the form below. Comments are moderated so they won't appear immediately. If you have a RIPE NCC Access account, we would like you to log in.