By analysing how resolvers select authoritative name servers in the wild we investigated how DNS operators are able to reduce DNS response times.
The Domain Name System (DNS) is a critical part of the Internet infrastructure and maps domain names to IP addresses in a distributed way. As shown in a recent blog post on RIPE Labs and in previous research, DNS queries can form a noticeable part of web latency , which is why we investigated how DNS operators are able to reduce DNS response times.
We currently run 8 separate servers for .nl, of which 5 are unicast and 3 use anycast across more than 80 sites. Recursive resolvers can choose between any of these 8 servers to send their queries to. Previous research  has shown that the recursive resolvers have different strategies how to select a name server. Some take the round trip time (RTT) of a server into account, others choose a server randomly. However, they did not estimate how prevalent these strategies are on the Internet.
Therefore, we ran our own measurements with 9,000 RIPE Atlas probes that query a test domain. We were using seven different name server setups with up to four unicast servers spread across the world. For each setup, we instructed the probes to resolve the test domain every two minutes for around one hour.
In Figure 1, we can see the distribution of queries of 9,000 RIPE Atlas probes and their locally configured recursives to two authoritative name servers – one in Frankfurt (FRA) and one in Sydney (SYD).
We split the figure into 6 sub plots, one for each continent on which the probes are located. We can see that the majority of recursives of probes located in Europe (EU) sent most of their queries to the name server in Frankfurt. In contrast, the recursives of probes in Oceania (OC) sent most of the queries to the name server in Sydney.
In the second figure, we replaced the name server in Sydney with a name server in Dublin. As a consequence, the distribution of queries becomes more balanced between the two servers.
Figure 1: Recursive queries distribution for authoritatives in Sydney (SYD) and Frankfurt (FRA). In each sub figure, recursives on the left sent all queries to Frankfurt, and recursives on the right sent all queries to Sydney.
Figure 2: Recursive queries distribution for authoritatives in Dublin (DUB) and Frankfurt (FRA).
In fact, we discovered that up to 69% of recursive resolvers send the majority of queries to the fastest responding name server. However, some queries are also sent to the slower responding authoritative as well. In some scenarios, 41% of recursive resolvers send most of their queries the slower responding authoritative.
This can increase the reliability and security, but has also the consequence that still many queries are not served as fast it would be possible.
In practice, a request from a recursive resolver in the U.S. to a unicast name server located in The Netherlands will always take at least 70 ms to be answered, due to the sheer distance between the two continents. However, based on our measurements we now know that recursive resolvers in the U.S. will still send a significant share of their queries to this authoritative, despite the fact that there are authoritatives closer by. This observation led us to the conclusion that DNS operators should not rely on the selection strategies of recursive resolvers but should actively optimize their own set up if they want to decrease the response times.
Thus, we recommend that all of the name servers of a DNS operator should be deployed as an anycast service, with sites equally spread across the world. Then it does not matter which name server a recursive selects. The routing protocol BGP will (hopefully) make sure that it gets directed to a name server site nearby, which can answer the query as fast as possible.
Use for .nl
We discussed our finding with our operations team and recommended to phase out our unicast name servers and replace them with one or more well-connected anycast name servers. We will keep our readers posted about the further developments.
We have released a technical report with our detailed findings. The report is publically available here.
This technical report is joint work of Moritz Müller (SIDN Labs), Giovane C. M. Moura (SIDN Labs), Ricardo de O. Schmidt (University of Twente), and John Heidemann (USC/ISI). The datasets in this paper are measured with RIPE Atlas and are available at http://traces.simpleweb.org/ and at https://ant.isi.edu/datasets/.
The article was originally published on the SIDN Labs blog.