We found a bug in a dataset that we've been producing for years.
There are many IPv6 deployment statistics out there. Statistics that people probably refer most to are end user statistics (e.g. those produced by APNIC and Google) that point to IPv6 deployment for between 8-14% of Internet users. The RIPE NCC produces various statistics on a per-member (LIR) basis, for instance on IPv6 RIPEness (also note the most recent IPv6 RIPEness update on RIPE Labs). We also show the number of routed networks [Autonomous Systems (ASes)], per country announcing one or more IPv6 prefixes in our IPv6 ASNs prototype. Announcing an IPv6 prefix is one of the necessary steps for an Autonomous System to exchange IPv6 traffic with the rest of the Internet. Looking at what percentage of networks announce IPv6 gives us an idea of how far the IPv6 deployment is in a given country, and for this statistic it doesn't matter if a network is big or small because all are counted equally. A nice feature of IPv6 ASNs is that it allows you to compare countries, so you can brag about your country if it is doing well or apply some peer pressure if it isn't.
A day in the life of a data analyst
Earlier this month, I noticed something odd on new data we produced for IPv6 ASNs. When I checked the data on 1 January 2017, it looked like IPv6 deployment had dropped by 6% on 1 January 2017.
Figure 1: Data-oops screenshot. The interval for where the percentage of IPv6 enabled ASNs was misreported is shown in orange.
This turned out to be caused by a bug fix that was applied in early 2017 to the back-end database that the IPv6 ASNs graphs depend on. Figure 1 shows what I saw. After reanalysing the data, it became clear that we began to misreport the percentage of ASNs that announced IPv6 address space from June 2015 until the end of 2016 by up to 6% (28% reported vs. the real number being 22%). Aggregates with more ASes were more affected by the bug then aggregates with fewer ASes. In general, the comparisons between countries were still in the right ballpark though.
Subsequently, we reanalysed the data and have put the corrected data online at IPv6 ASNs. A snapshot of that is shown in Figure 2.
Figure 2: Corrected data, as could be found on https://v6asns.ripe.net at the time of publishing of this RIPE Labs article.
With the correct data you can see that the percentage of IPv6-enabled ASes has been increasing roughly linearly at around 5% per year since 2011. This means the number of IPv6-enabled ASes increased faster than the total number of routed ASes. But at the same time, if this trend would continue it would take another 15 years for all ASes on the Internet to be IPv6 enabled. And note, that this is only one of the steps that's needed for IPv6 deployment!
... And while we have your attention
Having to fix something embarrassing like this also makes you reflect on how to do data analysis and reporting better. We publish the methodology for IPv6 ASNs, but I've learned that open methodology and open data alone won't prevent someone from making programming errors. For future analyses and prototypes like this, we want to do better by publishing the code more often and therefore making it easier to detect bugs and produce code bases that others can reuse. Let us know what you think of this!
And for those thinking about deploying IPv6: just do it!