In this article, we present the first generic methodology to measure the expansion of Hypergiants in other networks; i.e., Hypergiant off-nets. Our findings show that the number of networks hosting Hypergiant off-nets has tripled from 2013 to 2021, reaching 4.5k networks. Our analysis also shows that the four largest Hypergiants have off-nets within networks that provide access to a significant fraction of end user population.
Over the last decade, two trends have transformed Internet traffic: the growth of streaming video and other content-rich applications; and the consolidation of content, such that most traffic comes from a few big companies. In 2007, it took thousands of Autonomous System Numbers (ASNs) to account for 50% of global Internet traffic. By 2009, it was only 150, and in 2019 half of global Internet traffic originated from five hypergiants.
Hypergiants are large content providers, cloud providers and Content Delivery Networks (CDNs) that are typically responsible for distributing the majority of traffic to end users; for example, Google, Netflix, Facebook and Akamai.
- Hypergiants that deploy servers inside other networks improve the fundamental challenges of the Internet — capacity, latency and congestion — and they reduce the amount of traffic traversing peering interconnections.
- Researchers have developed new techniques to discover these server deployments that rely on TLS certificates and HTTP(S) headers collected from 2013-2021.
- Many hypergiants have installed substantial serving infrastructure in other networks, with Google, Facebook and Netflix all increasing their off-net footprint significantly in the past seven years, in some cases by more than 2,000%, in terms of hosting Autonomous Systems (ASes).
- Facebook’s global Internet user population that can be served by Facebooks’ off-nets within their providers increased by 46% between 2017-2021.
Since 2000, many of these hypergiants have been deploying servers deep inside other networks, aiming to optimise content delivery and improve user experience by bringing content closer to the end users. We refer to these server installations as ‘off-nets’, as these installations are outside of the hypergiant’s own network (Figure 1). Off-nets are hard to detect as they are installed in thousands of networks and use the address space of the hosting network.
Although hypergiants play a dominant role in today’s Internet traffic, the networking community currently lacks a good understanding of their global expansion strategies. Understanding their deployment strategies over time is important, as off-nets challenge the networking community’s mental model for the value of peering and how traffic flows in the Internet. As the content is now often served from off-nets within the user network, less traffic crosses network boundaries.
To shed light on how hypergiants have expanded their off-net deployments, we at the University College London, Columbia University, Microsoft, FORTH-ICS, Lancaster University, University of Crete, and TU Delft developed a methodology capable of uncovering off-net deployments. However, we are not doing a head-to-head comparison between hypergiants, as off-nets are only part of a larger business strategy that also depends on hidden parameters, for example, traffic levels and costs.
Challenges in identifying off-net server deployments
Off-net servers typically use IP address space announced by the hosting network, making it impossible to employ traditional approaches (for example, IP-to-AS mapping) to detect them as belonging to hypergiants. Alternative approaches (Böttger et al. and Calder et al) used in the past, either have limited coverage, as they require vantage points in tens of thousands of networks, or are tailored to a particular hypergiant, so they lack generality and are fragile to future hypergiant changes.
In our approach, we leveraged the fact that, in today’s Internet, most of the traffic is encrypted (Google reports that 95% of traffic across its services is encrypted). To encrypt their traffic, hypergiants deploy their TLS certificates to the off-net servers. TLS certificates provide a trustworthy source of identity for services running on a server. Thus, a server possessing a hypergiant’s certificate indicates that it is a server related to the hypergiant.
At first glance, it may seem that looking for hypergiant TLS certificates immediately solves the problem of locating all off-net servers. However, this is not the case due to the complex and heterogeneous deployment strategies of different hypergiants.
Firstly, it is not trivial to determine which certificates to look for, as there is not necessarily one certificate that definitively identifies each hypergiant. Through our study, we found that hypergiants have different management strategies on how they deploy their certificates. Hypergiants often host a range of services, each with varying regulatory requirements (for example, whether the services can only be hosted in particular regions or only in hypergiant-owned and operated facilities), which can restrict which certificates are available from which servers.
Secondly, diverse deployment practices may lead to one hypergiant’s certificate being deployed on the server of another hypergiant:
- Some hypergiants use their own infrastructure for several services and third-party CDNs for others (for example, Twitter images come from Akamai and Verizon, but some other content comes from their infrastructure). Hypergiants may use third-party CDN servers for resilience, capacity, or to extend their deployment footprint.
- A certificate may exist on a server that is not related to content distribution for the hypergiant. Some cloud providers offer on-premise versions of products, such as Google GKE and Azure Stack, which are managed by the cloud provider but do not run any public services. Management interfaces of such deployments may return a certificate of the cloud provider.
To tackle all these challenges, we developed a five-step methodology that uses corpuses of TLS certificates and HTTP(S) headers derived from publicly available scanning campaigns (Rapid7 and Censys) as a building block.
Step 1: We cleaned the TLS certificates dataset by removing self-signed and expired certificates as well as certificates with a non-verified chain, as we expect Hypergiants to use only valid and publicly trustworthy certificates for their services.
Step 2: Next, by using the valid certificates and the fact that we know that servers on the Hypergiant’s network are Hypergiant servers, we identified what TLS certificates should be present. To do this, we extracted and constructed a set of on-net TLS fingerprints.
Step 3: We used the on-net TLS fingerprints [step 2] to find certificates in servers outside of the Hypergiant own network that match them. This produces a set with candidate off-net server deployments.
-- At this point, we can’t reach a conclusion as we have only identified servers in which the Hypergiant may possibly run a service. For validating the ownership of the underlying hardware, we use two additional steps.
Step 4: We analysed the HTTP(S) headers of all related IP addresses from step 3. Assuming a server hosting Hypergiant content will be running a Hypergiant software stack, it will add Hypergiant-specific headers to the content it serves. If we observed the Hypergiant headers, we knew that the server is a Hypergiant off-net. Otherwise, if the headers did not match we excluded it. Based on this, we constructed two different datasets, the HTTP and HTTPS header fingerprints.
Step 5: Finally, we applied the HTTP(S) fingerprints [step 4] to the candidate off-net server deployment inferences from the TLS certificates [step 3] and extracted the final off-net inferences.
To summarise: We consider a server as a Hypergiant off-net, if the TLS certificate and the HTTP(S) headers match the Hypergiant’s fingerprint and the IP address is outside the Hypergiant’s own network. (Refer to our paper for a complete methodology description.)
Longitudinal expansion of off-nets by major hypergiants
Figure 2 depicts the evolution of the number of ASes that host off-nets for Google, Facebook, Akamai and Netflix during the last seven years. Some trends worth noting include:
- In 2013, Google operated off-nets in 1,044 ASes. By 2021, it had off-net servers in more than 3,810 ASes.
- Our study captured the birth and rapid expansion of off-net deployments by Netflix and Facebook. Facebook had zero off-net presence in 2013, and by April 2021, it had off-nets in 2,214 ASes (see Figures 3 and 4).
- For two years Netflix off-nets were serving content using only HTTP, thus the strange dip. Despite this, the Netflix footprint continues to grow and in April 2021 reached 2,115 ASes.
Overall, Facebook and Netflix seem to mirror Google, but just lag by a few years. On the other hand, Akamai seems to be pursuing a different strategy, as the off-net footprint reached a plateau in 2017 and since then has started to decrease.
Our analysis also revealed that the number of ASes that host at least one of the top-four hypergiants (Google, Facebook, Netflix and Akamai) has almost tripled between 2013 and 2021.
What fraction of the Internet user population is ‘covered’ by off-nets?
We also found that the user population that can potentially be served by hypergiants’ off-nets has significantly increased over the last few years. To assess the percentage of the population per country we relied on APNIC estimates.
Figures 3 and 4 provide an example of the increase in each country of ASes hosting Facebook off-net servers between 2017 and 2021.
The colour of a country represents the percentage of Internet users in the country that have an off-net within their ISP. For instance, in 2017 Facebook had servers in ASes estimated to host 26.3% of Australian Internet users, and in 2021 it expanded to 86.8%.
Facebook’s expansion across those four years increased the percentage of users globally with Facebook off-nets in their ASes from 34.2% to 49.9%. The figures indicate the locations of Facebook servers as inferred with heuristics developed in APNIC Hackathons in 2017 and 2021 relying on their hostname.
A hypergiant can potentially serve more of the Internet population by using an off-net to serve not just the users within the hosting network, but also the users within its customer cone, that is, the networks that use this network as a provider. In the case of Facebook, 49.9% of Internet users are in ASes that host off-nets, whereas 62.3% are in those ASes or their customer cones (Figure 5).
TLS adoption is widespread
An interesting aspect that our work revealed is how the widespread adoption of TLS — intended to protect users’ privacy — had the unintended consequence of providing the basis for revealing hypergiant off-net footprint using publicly available data. Knowledge of the hypergiant server location could make attackers’ lives easier and it can also be used by competitors to inform investment decisions. We recognise that some hypergiants may in the future hide their off-net deployments from our detection methodology for confidentiality and security reasons.
In our paper, we also describe a set of techniques that could be used towards achieving this goal (for example, employing TLS-SNI). However, we believe that such approaches only raise the bar for server identification and do not completely hide off-nets. The central idea behind our methodology will continue to work, as for security reasons hypergiants will always include their company information in their TLS certificates to prove their identity.
To learn more about our study and results, please read our paper published at SIGCOMM 2021 and watch our presentation, or leave us a comment below. You can also check out our project website to explore our findings and datasets as well as the software we used.
Acknowledgments: We are grateful to Rapid7 and Censys for providing us with research access to their datasets. We would also like to thank Olivier Bonaventure and the anonymous reviewers for their valuable feedback. This work has been funded by the European Research Council (ERC) Starting Grant ResolutioNet (ERC-StG-679158) and by NSF awards CNS-1836872 and CNS-2028550.