Much has been said about how Google uses the services they provide, including their mail service, their office productivity tools, file storage and similar services, as a means of gathering an accurate profile of each individual user of their services. The company has made a very successful business out of measuring users, and selling those metrics to advertisers. But can we measure Google as they undertake this activity? How many users avail themselves of their services? Perhaps that's a little ambitious at this stage, so maybe a slightly smaller scale may be better. Let's just look at one Google service. What I would like to describe here is the results of an extended effort to measure which of the world’s Internet user population are users of Google’s Public DNS Service.
How do folk get to use Google’s Public DNS service? One way is for end users to configure their systems to use Google’s Public DNS service by following the configuration instructions at https://developers.google.com/speed/public-dns/docs/using . Yes, its as simple as placing 18.104.22.168 into the locally configured list of DNS resolvers. Most of the time this should Just Work. Of course there are some times, perhaps where there is DNS traffic interception going on, that your efforts to use a particular DNS resolver might well be thwarted by the actions of these middleware interceptors that intercept your DNS query packets, and answer them via a local DNS resolver, who then spoofs the identity of Google’s DNS resolvers in their response back to you. However, for many users it does work exactly as intended. And it’s not just individuals who have taken advantage of this service. It’s also evident that larger networks and ISP platforms have also availed themselves of this Google service, and they appear to use Google’s DNS resolvers as DNS forwarders from their own DNS resolver services.
Can we actually measure the extent to which end users and networks make use of Google’s DNS services?
Why is this question of interest?
It’s worth noting that almost everything we do on the Internet starts with a domain name. Whether it’s browsing the web, Twitter, Facebook, videos, talking, or almost any other form of application, the universal way of connecting to a service is by resolving the service’s domain name to an IP address, then starting a communications session with the identified remote service point. If one could see the entire panoply of DNS queries in one single view, then one would have a comprehensive picture of what everybody is doing on the Internet, in real time. But it’s not really necessary to have such a comprehensive view. As any statistician could tell you, it is possible to construct a comprehensive picture of the Internet from a far smaller sample set. Google’s Public DNS service is apparently very successful. Many folk direct their queries to these Google servers. Sometimes its faster, sometimes its more complete, but whatever the reason, many users have gone down this path. So I’d like to look at just how much of the Internet’s user population has their DNS queries answered by Google. And from that measurement data maybe we could make a guess as to just how complete Google's DNS-derived picture of the Internet might look like.
We had started along this path by looking for something entirely different. We were interested to measure the extent to which users pass their DNS queries to DNS resolvers that perform DNSSEC validation. During this investigation, at the start of 2013, Google announced that they would be turning on DNSSEC validation for their resolvers. At that point we were interested to understand to what extent would this announcement alter the overall landscape of DNSSEC validation.
How can we measure what end users do? Broadly speaking, there are two forms of approach. The first is to instrument a service that is very widely used and conduct the measurement exercise from that platform. Of course the precondition here is to have access to a widely used service point and be permitted to add various forms of action scripts into that service environment. The second approach is to inject the measurement code into the user’s environment, and have the user perform the measurement test directly. We have been using this latter approach for some years now, based on using the online advertisement network as a means of gaining access to the user environment, and then using a simple script embedded in the ad to request the user to perform a fetch of a small set of URLs.
If we carefully construct the URLs that are presented to end users to fetch, then it is possible to expose a number of aspects of the user’s environment. The basic approach is to use three URLs, where the DNS names are respectively DNSSEC-signed, DNSSEC-signed with invalid signatures and are not DNSSEC-signed at all. These considerations relate to the use of unique URLs at each invocation of the experiment. It is critical to avoid the interception of these URL resolution and fetch tasks being trapped by intermediate DNS and WEB caches, as we need to intuit end user behaviour based on interactions we see at the authoritative DNS and web servers for the experiment’s URLs. That means that we use an environment that is a little unwise in a normal context, in that the DNS is served from a single authoritative name server, rather than the more typical configuration of 2 or perhaps 3 name servers. Also, we use a DNS name where we have quite deliberately broken the DNSSEC signature. And of course every name contains unique components, and we apply the DNSSEC signatures across these unique name components.
Analysing the Experimental Technique
The DNS is both simple and incredibly complicated. Its simple in so far as its a protocol where a client generates a query as a DNS packet, and sends it to a DNS resolver, and the DNS resolver returns the packet as an response. If the queried name exists, the resolver is expected to have placed the details of the answer into the answer section of the DNS packet. Otherwise, the response is expected to have an appropriate diagnostic code set. Question. Answer. Simple.
And the model of resolution can equally be presented in extremely simple terms. To answer the question, the resolver asks the DNS name server that is “authoritative” for the zone being queried, and passes that response back to thew resolver. Figure 1 shows the DNS in this naive manner.
But of course this naïve view conveniently covers up a massive amount of detail and complexity in the DNS. Hidden behind this seemingly simple query/response interface is a distributed database of hundreds of millions of individual entries, served from a set of some tens of millions of DNS resolvers. Their interconnection is highly varied, and the resultant system behaviour is not only diverse, but at times completely perverse as well! A small snapshot of the internal structure of DNS resolvers is shown in Figure 2.
However, this level of internal structure of the DNS is not directly visible. DNS queries have no “trail” information. When resolver A forwards a DNS query to resolver B, it has no ability to describe its motives in so doing. It cannot identify the client that triggered the original query, nor expose the internal logic that lead to the resolver generating this query nor the logic that selected this particular resolver as the resolver to receive this query.
So when we want to ask: “How much of the Internet’s end user base uses Google’s Public DNS Service?” it probably makes some sense to explain how we will go about answering that question. The simplification we use here is look at the DNS world from the perspective of the authoritative name server, which tends to cut out all the intermediate DNS resolvers. What we are left with is, from the perspective of the authoritative name server, a “visible” DNS resolver (Figure 3). By itself, this simplification would still not really help. However, if we pass every client a completely unique DNS name to resolve, then so as long as we keep track of the association of clients to unique DNS names, we can associate each client with the visible resolver or resolver(s) that they use.
Of course the client may have selected this resolver themselves, in which case they may be directly aware of which resolvers they use. Or they may be using a local access network, that has a local resolver that passes all its requests to a recursive forwarder who, in turn,… and so on. In this case the selection of the visible resolver may well be a choice that is completely opaque to the end user. So when we say that a client is “using a Google DNS service”, what we mean in this context is that the visible resolver that ultimately passes the DNS query to the authoritative name server is part of the Google Public DNS resolver collection.
We ran an online advertisement with these URLS as fetch targets from the 9th to the 26th of May 2013. The experiment was invoked by 2,498,497 clients over this period. 92.8% of these experiments used visible resolvers that were not operated by Google. The other 7.2% of clients ultimately had their queries passed to the experiment’s authoritative name server from Google’s DNS servers.
That’s a very large number for a relatively recent service offering. The uptake of use of this service is certainly very impressive.
And if the logs of these DNS resolvers provide a useful insight as to the real time online activities of the end user population, then having a clear view of the activities of some 7% of the entire end user population of the Internet is a particularly valuable observational vantage point!
We can drill down into these numbers to get a bit more detail. The URL that is invalidly DNSSEC-signed returns a somewhat unhelpful error code back to the client, namely a SERVFAIL error, indicating some unspecified error on the part of the DNS server. All Google’s DNS servers perform DNSSEC validation, so they will return these codes to the DNS client who posed the query. If the client has alternate resolvers configured, then they will interpret this response as grounds to repeat the query to the alternate resolvers. From this behaviour we can see the number of clients who exclusively use Google’s Public DNS services, and those who have alternate servers. We observed that 5.3% of users exclusively used Google’s DNS servers, while 1.9% used Google in conjunction with other DNS resolvers.
Given that the end client’s IP address can also be geo-mapped into a country of origin with a reasonably level of certainty, its also possible to see if particular countries make significant use of Google’s service.
|6||LA||300||18.7%||7.3%||74.0%||Lao People's Democratic Republic|
|10||TZ||297||16.2%||23.6%||60.3%||United Republic of Tanzania|
|15||PS||2,779||14.2%||38.9%||46.8%||Occupied Palestinian Territory|
The table shows the adoption of Google’s Public DNS by country. In terms of the relative level of penetration within countries this certainly shows that if Google’s aim through this facility was to provide services to the developing world, then this list is consistent with that overall objective, in so far as there is a relatively high level of representation here from such economies.
Of course in June of this year Edward Snowden fled the United States, and released material relating to the until then covert eavesdropping activities of the National Security Agency of the United States. There has been some resulting public concern about the extent to which our online activities generate a rich plume of digital exhaust, and the extent to which others have been sniffing these fumes and generating accurate profiles of ourselves, not only as online users, but as consumers and as individuals. There is no published material whatsoever to assume that Google’s Public DNS service has been compromised in any way by such agency activity, but at the same time there is no undertakings by Google as to what use it makes of the DNS data generated by this free service, nor any undertakings that others may have had access to such data.
As Renesys reported at the end of October () Google’s DNS service has left Brazil, and the report suggests that this action by Google is in response to forthcoming Brazilian legislation that will require Internet companies operating in Brazil to store data about Brazilian users within Brazil.
Did the level of public use of Google’s Public DNS services change in response to these events?
We have re-run the same experiment in the ensuing months, and the picture is certainly not one of monotonically increasing up and to the right adoption of Google’s public DNS service. Numbers were at their lowest in August, when the stories of the Snowden revelations and their consequences appeared to be well covered throughout the world’s press. Since then the adoption rate has resumed its increase, and by November it appears that the level of use is back to where it was in May.
If we compare the September ’13 numbers against the May ‘13 numbers we can derive a national table of those countries where the level of use of Google’s DNS service fell over that period, and those countries where it rose. Table 3 shows the top 20 list of countries where use fell over that period, and Table 4 shows a comparable list where this use increased.
|Rank||CC||Delta OFF||MAY%||SEP %||Country|
|2||PS||22.7%||53.1%||30.4%||Occupied Palestinian Territory|
|7||LA||6.3%||26.0%||19.6%||Lao People's Democratic Republic|
|13||TZ||4.9%||39.7%||34.7%||United Republic of Tanzania|
|15||BA||3.1%||9.0%||5.8%||Bosnia and Herzegovina|
And the list where use increased over the same period:
A similar picture can be drawn at the level of networks whose clients have their DNS queries directed to Google’s Public DNS service. Table 5 shows this for the top 20 such networks, using the originating AS as the network indicator, for September 2013.
|1||45899||4,449||51.4%||2.0%||46.4%||VNPT-AS-VN VNPT Corp,VN,Vietnam|
|2||7552||1,597||38.6%||1.8%||59.5%||VIETEL-AS-AP Vietel Corporation,VN,Vietnam|
|3||18403||2,560||35.9%||1.1%||62.8%||FPT-AS-AP T, Technology,VN,Vietnam|
|4||4230||505||29.1%||11.4%||59.4%||EMBRATEL-EMPRESA BRASILEIRA DE TELECOMUNIC, Brazil|
|6||17762||315||26.0%||22.8%||51.1%||HTIL-TTML-IN-AP Tata Teleservices Maharashtra Ltd,IN,India|
|7||17974||7,162||25.1%||5.4%||69.4%||TELKOMNET-AS2-AP PT Telekomunikasi Indonesia,ID,Indonesia|
|8||3549||529||22.5%||6.4%||71.0%||GBLX Global Crossing Ltd.,US,United States of America|
|10||131222||577||19.7%||24.9%||55.2%||MTS-INDIA-IN 334,Udyog Vihar,IN,India|
|12||8517||452||19.2%||8.8%||71.9%||ULAKNET National Academic Network,TR, Turkey|
|13||174||596||19.1%||3.8%||77.0%||COGENT Cogent/PSI, US,United States of America|
|14||55824||558||18.6%||14.8%||66.4%||RSMANI-NKN-AS-AP National Knowledge Network,IN,India|
|15||17451||455||18.4%||4.8%||76.7%||BIZNET-AS-AP BIZNET NETWORKS,ID,Indonesia|
|19||18101||1,009||16.9%||6.5%||76.5%||Reliance Communications.DAKC MUMBAI,IN,India|
|20||20960||269||16.3%||6.6%||76.9%||TKTELEKOM-AS TK Telekom sp. z o.o.,PL,Poland|
And again its possible to look at those networks where the change in use has varied between May and September. The following two tables show the top 20 networks with falling and rising use.
|Rank||AS||Delta OFF||May||Sep||AS Name|
|2||15975||46.9%||57.9%||10.9%||Hadara,PS,Occupied Palestinian Territory|
|5||10620||13.2%||16.2%||3.0%||Telmex Colombia S.A.,CO,Colombia|
|6||45609||10.7%||12.4%||1.7%||BHARTI-AS Bharti Airtel,IN,India|
|8||45595||7.4%||14.2%||6.8%||Pakistan Telecom Company,PK,Pakistan|
|9||34984||6.9%||20.4%||13.4%||Tellcom Iletisim Hizmetleri,TR,Turkey|
|12||4780||6.3%||21.5%||15.1%||SEEDNET Digital United Inc.,TW,Taiwan|
|14||44957||5.5%||15.9%||10.4%||OPITEL Vodafone Omnitel N.V.,IT,Italy|
|16||47331||5.3%||18.6%||13.3%||TTNET TTNet A.S.,TR,Turkey|
|17||8612||5.1%||14.2%||9.1%||TISCALI-IT Tiscali Italia S.P.A.,IT,Italy|
|18||9498||4.7%||24.7%||20.0%||BBIL-AP BHARTI Airtel Ltd.,IN,India|
|19||9121||4.7%||19.2%||14.5%||TTNET Turk Telekomunikasyon,TR,Turkey|
|Rank||AS||Delta ON||May||Sep||AS Name|
|2||2609||24.7%||5.4%||30.1%||Tunisia BackBone AS,TN,Tunisia|
|5||131222||13.8%||30.9%||44.7%||MTS-INDIA, Udyog Vihar,IN,India|
|11||10139||6.0%||1.4%||7.4%||Smart Broadband, Inc.,PH,Philippines|
|12||6939||5.8%||5.6%||11.5%||Hurricane Electric, Inc.,US,United States|
|15||55824||5.0%||28.4%||33.5%||RSMANI, National Knowledge Net,IN,India|
|16||45899||4.5%||49.0%||53.5%||VNPT-AS-VN VNPT Corp,VN,Vietnam|
|17||6503||4.1%||4.4%||8.6%||Axtel, S.A.B. de C.V.,MX,Mexico|
|19||7303||4.0%||5.5%||9.5%||Telecom Argentina S.A.,AR,Argentina|
|20||9829||3.9%||4.4%||8.4%||BSNL National Internet Backbone,IN,India|
There is no doubt in the value of Google’s public DNS service.
It’s a welcome step to see a DNS resolution service take DNS security seriously, and validate the responses that they pass back to their clients. It’s also a welcome step to see a very large scale DNS service operate using dual stack capabilities. The Google service operates with integrity and does not appear to filter the DNS in arbitrary ways. And it’s well engineered, so it’s fast and reliable. And it’s free. So these are all good reasons to use the service.
But of course TNSTAAFL , and while there is no specific information from Google as to how the p-DNS data might be used by the company, there is no doubt that a real time feed of the online activities of some 7% of the entire Internet user base is a rich vein of information, and this data stream could be added to the existing corporate information sets to add to the accuracy of the individual profiles that fuel their advertising business. Whether the same information is accessible to various US government agencies, and under what terms, is not something that appears to have been mentioned in the recent disclosures.
For some, this may be an acceptable tradeoff of some level of information about their online use in exchange for service. For others such an exchange may be a step too far. And for others the decision has been placed out of their hands, as their service provider may have decided to use Google’s service in any case. But in the morass of the other issues with the DNS, including the various forms of exploitation and attack, and the ongoing issues with the DNS being perverted to perform massive DOS attacks, the various forms of use of DNS-like names in differing contexts, new and old TLDs, colliding names, IDNs and every other topic that forms the universe of DNS discourse, its still really encouraging for me to see that there are still some folk are talking high quality DNS resolution performance seriously!
Geoff Huston presented these measurements at the recent IEPG meeting in Vancouver.