There have been growing concerns over the last few years about the excessive concentration of control over the Internet's markets and infrastructure — what is commonly referred to as Internet centralisation. In this article, Giovane Moura talks about how he and his colleagues have been measuring centralisation and its effects.
- Five cloud providers are responsible for a third of the DNS queries to two ccTLDs (.nl and .nz).
- The providers differ, in some cases significantly, in terms of their adoption and use of DNSSEC, IPv6 and TCP.
- The positive side of centralisation is that, when a provider adopts a privacy-friendly technology, it benefits a large user base.
Perhaps the most public of those concerns have come from governments, particularly in the US, where many of the main players are based. Just this October, the US Department of Justice (DoJ) formally accused Google of illegally protecting its monopoly following its previous antitrust review of the big tech companies. The US House Lawmakers have also condemned ‘Big Tech’s monopoly power‘. Similarly, in the European Union, regulators have long been trying to curb Big Tech powers.
However, governments are not the only ones paying attention to this issue. There has been an IETF draft covering the issue of centralised infrastructures for the Internet, followed by a journal article. The Internet Society, in turn, released a report addressing how consolidation will impact the internet’s technical evolution and use. Furthermore, some Big Tech companies currently rely on surveillance as their business model, raising serious privacy concerns.
From the technical point-of-view, centralisation increases the risk of creating a single point of failure. In 2016, when Dyn, a large Domain Name System (DNS) provider was hit by a major DDoS attack, some of its US-based clients could not reach prominent websites such as Twitter, Netflix, Spotify, Airbnb, Reddit, Etsy, SoundCloud and The New York Times.
Making claims about internet centralisation is one thing; measuring it and its effect, particularly from a technical standpoint, is an altogether harder thing. One method that my colleagues and I recently experimented with and presented our findings on at the ACM IMC 2020 conference and the RIPE 81 meeting, is to analyse the DNS traffic from cloud and content providers to see how market dominance is reflected in traffic dominance.
Over the last three years, we have analysed the DNS traffic generated by Amazon, Google, Microsoft, Cloudflare and Facebook from authoritative DNS servers in three different zones: two country-code top-level domains (ccTLDs) — The Netherlands' .nl (6M domain names) and New Zealand’s .nz (710k domain names) — and B-Root, one of the thirteen root servers.
For our study we analysed 55.7 billion queries (~30B from .nl, 12B from .nz, and 14B from B-Root), covering snapshots from 2018 through 2020, as shown in Table 1 below. For example, in the snapshot week in 2020, we see that .nl received 13.75B queries, from 1.99M unique addresses (resolvers, both IPv4 and IPv6), from 41,717 autonomous systems (ASes).
|Week||Queries (total)||Queries (valid)||Resolvers||ASes|
|Week||Queries (total)||Queries (valid)||Resolvers||ASes|
|Date||Queries (total)||Queries (valid)||Resolvers||ASes|
Table 1 — Evaluated datasets for .nl, .nz and B-Root DNS servers (2018 to 2020)
How much traffic comes from the Cloud?
To determine the percentage of traffic originated from each cloud provider (CP), we mapped a query source IP address to its respective AS. We aggregated the results per AS and used the cloud’s AS to attribute traffic.
|Amazon||7224, 8987, 9059, 14168, 16509||No|
|Microsoft||3598,6584, 8068–8075, 12076, 23468||No|
Table 2 — Cloud/content providers and their ASes
5 clouds: ⅓ of the traffic
The figure below shows the results of query volume per CP. For the Netherlands and New Zealand, we see that the five CPs are responsible for around 1/3 of all DNS queries. That is a significant concentration, especially considering that all together, they consist of 20 ASes — and .nz and .nl are visited by roughly 40,000 ASes/week.
Figure 1 — Query ratio of clouds per ccTLD and B-Root
The traffic to B-Root is less concentrated — less than 10% of the traffic comes from these clouds. A reason for this may be that B-Root receives lots of chromium browser-based queries.
Oddly, Google is more dominant on .nl traffic than .nz — showing different cloud market penetration according to the vantage points. And its Google Public DNS service is responsible for the bulk of the queries: almost 90% of all queries for both NZ and NL.
What records do cloud providers ask for?
As a distributed database, the DNS can be used to store different types of records. Figure 2 below compares the popularity of record types for NL in 2018 and 2020 (NZ and B-Root follow similar patterns). From it, we can see that A records (IPv4 addresses) are the most popular, followed by AAAA (IPv6 addresses).
In terms of Google, we can also see a considerable increase in NS records (records that store authoritative name servers) in 2020. Why did this change take place?
Figure 2 — Resource Records per cloud provider
Analysing the query names, we found that Google deployed QNAME Minimisation (RFC 7816) in December 2019. Resolvers that conform to this RFC send minimum information required about the domain names to their authoritative servers, to protect user’s privacy. And RFC 7816 specifies that NS queries should be asked first — which explain the increase in the proportion of NS queries.
To confirm this we analysed monthly NS queries from Google — Figure 3 shows this distinct change.
Figure 3 — Google’s queries distribution per month for .nl
This finding shows an advantage of centralisation: when one privacy/security feature is deployed, it protects many users at once.
Given our vantage points (NZ, NL, and B-Root) serve the clouds, we can compare them, with regard to technology adoption, with DNSSEC usage, IPv4 vs IPv6 traffic, and UDP vs TCP usage.
DNSSEC provides authenticity and integrity for the DNS. Given these five CPs are so large, one would expect they are equally up to date with DNSSEC adoption.
We can measure DNSSEC adoption by analysing the volume of DNSKEY queries — one type of record in DNS. From Figure 4 we can see that for NL in 2020 (NZ has similar figures, please refer to the paper):
- Microsoft has sent 0.02M DNSSEC queries, out of 1.1B in the same period (001% of total)
- Cloudflare has sent 11M, out of 460M queries (3% of total)
Figure 4 — Resource Records per CP in 2020, for .nl
The significant difference in the proportion of DNSSEC related queries show how dissimilar technology adoption is per CP. Thus, we cannot assume all CPs are similarly up to date.
Similarly to DNSSEC, we can also measure IPv6 usage from the part of the CPs. We initially expected similar results, but it turned out that there is a large variation in results, as seen in Table 3.
Table 3 — Query distribution per CP for ccTLDs
Again we found dissimilar patterns among the CPs for both NZ and NL:
- Good balance of IPv4/IPv6: Google, Facebook, Cloudflare
- More IPv6 from 2020: Facebook
- Very little IPv6: Amazon and Microsoft
Previous research has shown that resolvers, when presented with multiple authoritative servers, tend to send more traffic towards the one with the shortest latency values. But they still use all of the authoritative servers.
So what else could be the reason for these differences? Well, maybe it is the size of the resolver infrastructure. We see in Table 4 that Amazon and Microsoft have far fewer IP addresses reaching the NZ and NL authoritative servers.
|IPv4||37,640 (98.2%)||33,908 (97.9%)|
|IPv6||677 (1.8%)||737 (2.1%)|
|IPv4||14,069 (97%)||9,738 (95.4%)|
|IPv6||425 (3.0%)||468 (4.6%)|
Table 4 — Amazon and Microsoft DNS resolvers (week during 2020)
As for understanding why Facebook has been sending more IPv6 queries in 2020, we mapped each IP address from Facebook using reverse DNS; in this way, we could map the IP addresses to names and locations (they use airport codes for their servers).
As per Figure 5, we see 13 locations with location 1 sending the bulk of the queries. We then analysed the RTT of their TCP queries and saw that most locations confirm the resolvers’ preference:
- Location 8, 9 and 10 send mostly IPv4 queries because the median RTT of IPv4 is shorter than IPv6
- The remaining locations have a more balanced distribution, because their RTTs for IPv4 and IPv6 are more similar
- For only location 1 we cannot say anything about their DNS TCP RTT, given they send very few TCP queries -unfortunately for us, this is the location that sends most of the queries.
Figure 5 — Facebook Resolver’s location and IPv4 and IPv6 usage when querying .nl’s Server A (w2020)
Finally, we analysed the CPs’ transport protocol of choice, with UDP dominating — it delivers faster responses within one RTT, and TCP requires an extra RTT due to its handshake.
Table 5 — UDP and TCP query proportion for .nl and .nz
But as with the other technologies, not all clouds are the same. Facebook, for one, sends proportionally far more TCP queries than the others. We wondered why that is. Typically TCP is used as a fallback mechanism for large responses — if the authoritative server cannot fit a response in a single UDP response, it should truncate it, and resolvers should query again with TCP.
The limit of UDP queries is partially controlled by the resolvers, which can advertise their EDNS0 buffer size. This tells the authoritative server that they can process UDP responses up to that size.
We looked into Facebook-advertised EDNS0 buffer size, and roughly ⅓ of them are 512 bytes long, which is indeed very small. For example, the DNS Flag Day 2020 recommends a buffer size of 1,232 bytes. That would explain more truncation and more TCP queries.
Figure 6 — CDF of EDNS(0) UDP message size for .nl (w2020)