Persistent DNS Connections for Reliability and Performance
• 10 min read
For decades, the Domain Name System (DNS) has relied on UDP as its transport protocol of choice, mostly because of its simplicity. New transports such as DNS-over-TLS and DNS-over-HTTPS are now gaining popularity: they offer increased privacy while preventing the use of the DNS as a DDoS attack vec…
“Super interesting. I have a couple of questions; Could you share more about the query traffic sample you used to generate traffic to unbound from the stubs? Also, have you spent any time looking at the real world impact of using TCP for recursive to authoritative? If one of the big sells for using TCP is better RTT measurements and retransmit logic, then this might help for unicast authoritatives hosted in bad networks, or geographically far away from their customer’s resolver. Are you aware of any stubs or resolvers that have options to prefer TCP configuration like this? Do you think it would be safe to enable TCP-first in stubs at scale, without causing resourcing issues on existing resolvers? Same question goes for resolver to auth. Thanks for a well written article!”
Thanks for your comments! The traffic is generated according to a Poisson process, with all queries being identical (A query for example.com, with a static answer configured on the resolver). Not very realistic, but it's simple and I couldn't find real-world data on temporal distribution of queries. The issue with recursive-to-authoritative is the generally lower amount of connection reuse: you would pay the overhead of opening a persistent connection (high latency, high CPU cost for TLS) for just a few queries. Then, either you leave the connection open with almost no traffic, which would be costly for the authoritative side, or you close it, and it will be costly for you to open it again later. That being said, QUIC may make this easier with its 0-RTT connection resumption. Regarding support in stub resolvers, TLS stubs like stubby already use persistent connections because TLS is costly to setup :) If I remember well, they generally use a configurable timeout (for instance, if no queries are made for 5 seconds, they close the connection to the recursive) I think that's the way to go, because TLS adds privacy and does not seem much more costly than TCP once established (this is a preliminary result). In any case, recursive resolvers need to be configured accordingly: unbound for instance only accepts 10 simultaneous TCP/TLS connections by default... And you also need to raise various file descriptor limits if you want to handle lots of simultaneous connections. There is more details in the RIPE76 slides: https://ripe76.ripe.net/presentations/95-jonglez-dns-tcp-ripe76.pdf
Showing 1 comment(s)