Following the Facebook outage that took place on 4 October, we saw people looking to BGPlay to get a better view of what went on. Here's a look at what the RIPEstat visualisation has to show us about the event in question.
As you'll have heard from various sources reporting from just about everywhere, yesterday Facebook went down at 15:42 (UTC). The outage went on for several hours with the first signs of recovery starting to show at around 21:00 and another half hour or so passing before things were starting to look close to stable again.
In the resulting whirlwind of online reports, commentaries, gags, and speculation around the cause and impact of the outage, we saw a number of people out there reaching out to BGPlay - one of the visualisations available from RIPEstat - to get a better view of what happened as it happened.
Going Down in BGPlay
Here's a high-speed version of how the initial outage played out in BGPlay (you can also go look at it directly in BGPlay of course!):
What you see here is a representation of a whole set of AS paths as seen in BGP between a select set of vantage points (RIS peers - blue nodes in the diagram) and a target prefix (in this case, 220.127.116.11/24 originating out of AS32934 - the red node in the middle). For the sake of cutting down on noise, here we reduce down the number of vantage points to just two sets of peers (i.e. RIS collectors). Each of the lines is a particular link in one or more of the paths displayed as they pass through one AS to another.
The great thing about BGPlay is that it gives a clear view of how paths change over time, step by step across BGP updates. And that makes for particularly interesting visualisations of the kind of outage we saw yesterday. As you can see in the video, things go from pretty normal just before by 15:42, with plenty of links showing between the red node representing AS32934 and the other ASes.
Then over the next ten minutes or so, we see lots of activity until finally, at 15:53:47 (UTC), all the links are gone. At that time, the Internet no longer had any routes to the prefix 18.104.22.168/24, a prefix containing an important piece of the Facebook network (more specifically, the authoritative DNS name server - a.ns.facebook.com - for the facebook.com domain).
It's worth noting that the same situation can be seen whether we look at things for IPv4 or IPv6, as we see in this snapshot of the state of affairs at around 16:30 as shown in BGPlay for IPv6 address 2a03:2880:f0fc:c:face:b00c:0:35.
Whatever caused the outage, it was another several hours before things started to recover. And even then, as we also see in BGPlay, there were still a couple of bumps in the road. Note the hiccup here at around 21:11 before things finally return to a more stable state at around 21:30 (again, here's the direct link to this in BGPlay itself):
Tools like BGPlay help us piece together exactly when and where (topologically speaking) particular Internet events occurred. The visualisations shared here have a role to play in getting a clear first impression of what happened so we can get on with the task of understanding what caused the event in question, how big an impact it had, and what other parts of the Internet might have been affected.
We'll be diving into our other tools and services in the coming days to see what answers they have to provide on these further questions. And as always, we invite you to do the same. Look out for further analysis here on RIPE Labs as and when we have it!