The last two years stood witness to multiple Internet outages causing massive disruptions across the globe. Among views on how to make the situation better, a prevalent theory is that multi-Cloud, multi-CDNs (Content Delivery Networks) are inevitable and are the holy grail of the Internet. This article conducts a litmus test on this theory and tries to expose chinks in the armour.
2021 was a year of outages for the Internet. We witnessed three massive outages of AWS in a single month, Google’s outage in November, outages of Fastly, Cloudflare, Azure, Facebook, and the list goes on. These companies bear the maximum load of today’s Internet and thus raise significant concerns over the stability of the Internet itself. Multi-Cloud / multi-CDN architectures are touted as the solution for all such stability issues and are also hailed as the future of the Internet. This article critically examines the efficacy of these architectures
What are multi-cloud or multi-CDN architectures?
When a particular CDN or Cloud provider goes down, the services associated with it will also suffer an outage. For a logical mind, it makes sense to not put all the eggs in a single basket, but rather to distribute it across multiple vendors. There are 2 ways we can distribute the load:
- Vertical Stacking
- Horizontal Scaling
Let us examine vertical stacking first. In vertical stacking, we are stacking two or more vendors one above each other as shown in the diagram:
In such a setup, it is critical to ensure the uptime of all the vendors in play as the whole chain will collapse if one goes down. Evidently, this setup is not built to ensure availability but rather to dissipate the load on the origin server.
Horizontal scaling is purported to ensure availability by distributing the load across many vendors as shown:
By and large, this method does seem reassuring, but stands for a closer inspection.
What are the 3 Essentials of the Internet?
As it stands, there are three essential requirements for the Internet:
- Speed (or Performance)
- Availability
- Security
Let us illustrate with an example. To be a successful eCommerce company, I’d certainly need to ensure that my website is always up. And it needs to be fast and secure enough to defend against bad actors. These are the three legs of the milking stool upon which the Internet rests today. Take one away, and we are looking at a failed enterprise. Calls for multi-CDN or multi-Cloud setup are mainly focused on addressing the availability issue, but how would such approaches impact the other two aspects? Let’s take a closer look.
1. Speed/Performance
The Internet is not the same it was two decades ago. Websites keep getting heavier and more complex by the day. In the last ten years alone, the median size of a desktop site has grown three times whereas the mobile pages have grown by a factor of 7! (Note: images here also from this source.)
The heavier the page has gotten, the more demanding is the need for speed. Faster pages mean a better user experience. Speed is a critical feature of successful websites because speed translates into a higher-quality experience and contributes significantly to the conversion rates and revenues to a site. Akamai Technologies, Inc., a leading firm in content delivery network (CDN) conducted a research study back in 2017, which established the importance of performance in online retail sector:
- 100-millisecond delay in website load time can hurt conversion rates by 7%
- Two-second delay in web page load time increased bounce rates by 103%
- 53% of mobile site visitors will leave a page that takes longer than three seconds to load
Similar studies were conducted by tech giants like Walmart, Amazon in the past and they have all arrived at one conclusion; i.e., ‘Milliseconds cost Mega-Dollars’.
Following the model of multi-cloud/CDN, if we split our traffic between two vendors, would that trickle down to the same performance? The answer is an unequivocal NO!
Let’s take two popular CDN vendors as a case study: Akamai and Fastly
On the face of it, Akamai network consists of approximately 365,000 servers distributed worldwide in comparison against 60+ POPs of Fastly. Not only that, the two vendors would differ in their peering arrangements, content optimisation techniques, and intelligent features, which would compound the obvious performance implications. So if we were to distribute traffic between these two vendors, then a significant performance impact is to be expected. The same analogy holds true with any two vendors for that matter.
So here comes the big question – ‘Would we be willing to accept a perpetual impact on our site revenue in anticipation of an outage?’
2. Security
Now that we have understood the performance implication, we would extend the same argument to security as well. The undeniable truth in recent years is that, cybersecurity remains the biggest concern for tech firms. We now live in a time where everyday companies are impacted by cyber threats like ransomware, botnets, DoS attacks (Denial of Service), and much more. It is a no-brainer that there is significant disparity among the security companies in terms of their capability to thwart attacks. So much is the menace that, these days, even reputed companies are being targeted and are being vilified. Take an example of a reputed cybersecurity company – FireEye. Back in 2020, it admitted in a press release that few of its tools were targeted in a cyberattack.
Countless companies and research papers have constantly re-instated that these attacks will not only grow exponentially in coming years, but will evolve to be smarter and massive in their nature. Take the recent example of the Log4J vulnerability discovered in December, 2021. The vulnerability was so widely exploited by bad actors, that it went on to be the worst ever in the history of cybersecurity. So much so that the Director of Cybersecurity and Infrastructure Security Agency (CISA) Jen Easterly, was quoted on record saying:
“The Log4J vulnerability is the most serious vulnerability that I’ve seen in my decades-long career. Everyone should assume that they are exposed and vulnerable and to check that they are not vulnerable.”
Although we may never know for sure, Log4J is said to have impacted a large chunk of the Internet and is expected to haunt us in the next few years as well. Obviously, the damage caused by such security breaches is bizarre and can destroy the balance of any company/system. Take the example of Equifax’s data breach which occurred in 2017. It was estimated that the security breach leaked confidential information of 147 million people and costed the company approximately 500 million USD in terms of compensation and fines.
The fact is that no two security vendors would offer the same level of protection. With that being said, let us summon our previous conundrum - ‘Would we be willing to compromise on our site security, in anticipation of an outage?’. The answer would be a resounding NO.
3. Availability
One would think that multi-cloud/CDN would at least solve the availability issue, but the truth is far more complicated than it would seem. An Internet outage can be caused by a multitude of factors like DNS, messed up routing, compute issues of cloud, and so on. Even the websites follow this composite nature, in the sense that a website would have DNS on one vendor, the origin server hosted in another, and a complimentary CDN as well. Failure can occur in any of these components and planning a continuity plan means, that I should have a failover vendor for each of these components. If we were to adopt a true ‘multi-cloud’ architecture, then one would be looking at a hefty bill packaged with an extremely complex architecture and multiple moving parts. The sheer complexity is a ticking time-bomb and can inadvertently end up endorsing outages if failed to maintain vigilantly.
Conclusion
The theory that the multi-cloud/multi-CDN being the future of a stable Internet requires closer scrutiny. Distributing the risks over multiple vendors may not in itself be enough to ensure the future stability of the Internet. A single vendor is definitely a single point of failure, unless we ask them the right questions, assess their architecture and risk mitigation techniques. This will also contribute in building robust technologies by vendors.
Comments 0
Comments are disabled on articles published more than a year ago. If you'd like to inform us of any issues, please reach out to us via the contact form here.