Measuring domain usage on centralised public DNS resolvers can be very useful, but it's also pretty hard to do. Trufflehunter is a new open source tool that puts snooping techniques to good use in order to accurately estimate the popularity of domains.
The Domain Name System (DNS) is ubiquitous. Nearly every application, product, or service that exists on the Internet today has to make DNS requests, whether it is a website accessed from a desktop browser, a mobile app on a smartphone, or a virus covertly running on a server.
These DNS queries are traditionally handled by many distributed DNS resolvers run by ISPs. But that paradigm is starting to shift as Internet users increasingly start to use centralised public DNS resolvers.
Public resolvers have rapidly grown in popularity because they are hard-coded by default into various products and networks, like home routers (Google Home Routers use Google’s 22.214.171.124), browsers (Firefox uses Cloudflare DNS by default), and even entire public Wi-Fi networks (New York City’s public Wi-Fi uses Quad9).
This switch to public resolvers is driven by the fact that they offer services beyond just resolving a DNS request, like malware filtering or privacy protections like DNS-over-HTTPS that aren’t offered by ISP resolvers. And as it turns out, DNS centralisation is good news for Internet measurement research.
The Benefits of a Centralised DNS
It used to be that only the security-conscious or tech-savvy deliberately set their DNS resolver to a public resolver. One of the security reasons for this switch was to mitigate against privacy invasion attacks such as DNS cache snooping, where a snooper could send a DNS request for an interesting domain, and examine the response to see if that domain was in the cache of the resolver that had answered. If it was, the snooper could tell that a user of that resolver had visited the domain recently.
If you used DNS cache snooping against a user’s misconfigured home router, that made it all too easy to figure out if that user had accessed that domain name. Public DNS resolvers, on the other hand, have so many users it’s nearly impossible to de-anonymise anyone with cache snooping. This makes cache snooping privacy-preserving when directed against them, so it can now be a measurement tool, rather than an attack.
Cache snooping public DNS resolvers is a new opportunity to measure the real-time usage of Internet phenomena that are hard to study by any other means. For example, mobile stalkerware has previously been studied mostly in clinical settings, because the risk to victims and its relative newness has so far precluded large-scale studies. But stalkerware makes DNS requests, so its use and prevalence are visible to the DNS.
Another phenomenon we can study is contract cheating services, which offer to complete the homework assignments, projects, or even entire classes of students for a fee. Few students are willing to admit to using such a service when questioned, even anonymously, but the thriving traffic to these websites is revealed by DNS requests.
Measuring Public DNS Resolvers
The only catch is that public resolvers are much larger and more complex than small home routers, so anyone trying to measure domain usage on them has to reverse engineer their caching behaviour first.
Using a technique described in detail in our recent IMC paper, we at the University of California San Diego, in collaboration with CAIDA, discovered that some resolvers, like Cloudflare’s, have an architecture that makes measuring domain usage very challenging. Others, like Google’s, have a structure that allows us to make much more accurate measurements of a domain’s popularity than we ever expected.
Armed with this knowledge, we designed a tool called Trufflehunter to measure the popularity of various domains across the United States. Using stalkerware, contract cheating, and typosquatting as case studies, we showed that Trufflehunter can make non-trivial, lower-bounded estimates of service prevalence. These measurements are particularly useful for services whose prevalence was completely unknown, as was the case with the more dangerous types of stalkerware, for example.
Trufflehunter does have limitations — for one, cache snooping as a measurement technique is only capable of measuring a lower bound on the number of users who have accessed a domain. For another, even the largest public resolver we studied (Google Public DNS) only handles 10% of the world’s DNS traffic. But with the growing popularity of public DNS resolvers, it is to be hoped that this measurement technique will only increase in usefulness, and many more measurement studies will become possible.
Trufflehunter is open source if you’d like to measure domain popularity on your own! Get started today.
This article was originally posted on the APNIC blog.