With an increasing number of route collectors and the resulting increase in available routing data, how do we analyse such a large amount of data in a reasonable time? BGP Scanner was developed to meet the limitations of existing large-scale BGP data analysis tools.
Route collectors, including the Route Views project of the University of Oregon and the Routing Information Service (RIS) maintained by the RIPE NCC, have been an invaluable source of information about the Internet inter-domain ecosystem over the last 20 years.
They typically collect routing data, in the form of establishing BGP sessions, from organisations that voluntarily agree to participate, and then regularly include it in MRT format RIB snapshots. The sequence of each BGP packet is then collected in a predetermined time interval chosen by the collector.
Over the years, many things have changed that have impacted on this system. Most notable, the size of the Internet which encompasses more than 60K public ASes and a full IPv4 RIB count, including more than 700K different routes. BGP itself has also changed, with new extensions such as the possibility to route IPv6 (RFC 4760) and to advertise multiple path attributes for the same prefix (RFC 7911).
All these changes, together with an increasing number of route collector participants and new collector services emerging (such as Isolario, PCH and BGPmon), have resulted in a significant increase in available routing data (Figure 1). This has introduced a new problem: how do we analyse such a large amount of data in a reasonable time?
Figure 1: Cumulative amount of data provided by Isolario, Route Views and RIS (2000-2018)
MRT/BGP data reader state of the art
There are a number of existing tools/libraries to analyse BGP data in MRT format (Table 1).
Most of those that are listed below in Table 1 have not been updated for quite some time and some of them are missing the capability to handle basic extensions. It is important to note that all of them were not written focusing on performance.
While these limitations may be acceptable for simple utilities and analysers, they are less so for systems with performance constraints and in systems exploiting all data sources available without the possibility of having a Hadoop cluster run the computation and lower the running time. Furthermore, most of the available tools are only able to dump all the data within an MRT file, requiring users to select the BGP data needed to be analysed.
Table 1: MRT/BGP reader tools and libraries, with each one’s capabilities tohandle BGP extensions that have been introduced over time (as of 1 July 2018).
BGP Scanner and the filtering capability
BGP Scanner was developed to meet the limitations of existing large-scale BGP data analysis tools.
The library is multi-threaded, throughput-oriented, and focuses on avoiding superfluous copies and memory allocations to reduce the overhead from decoding MRT data. To this end, the library has been written in C, which enables low-level access and direct control over the machine. This exploits some of the features introduced in the ISO C99 and ISO C11 standards such as dynamic stack allocations and thread local variables. These choices also allow the library to be easily wrapped into higher level languages i.e. Python or Lua, so that it can be exploited by a larger community of users.
What sets BGP Scanner apart from most existing software is its ability to filter BGP packets by attributes, routes and announcements. This is especially useful for network administrators wanting to troubleshoot a particular routing event involving a well-defined subset of routes. For example, BGP Scanner can identify packets with particular patterns into an AS path attribute and/or can select every packet containing routing information concerning a given subnet/supernet.
BGP Scanner outperforms other tools six-fold
BGP Scanner has been tested against all the tools listed in Table 1 for time elapsed and the memory consumption required to analyse the first RIB available for a given collector in July 2018, followed by the sequence of update messages collected during this month.
Each file was preliminarily decompressed to remove the overhead caused by the compressing algorithm. The following results represent the average results of 10 runs, in order to avoid spurious effects caused by external factors. Each test was performed on a machine equipped with an Intel(R) Core (TM) i7-4790K 4.00GHz, 16GB RAM and running Debian Stretch.
Figure 2: Tool performances during the analysis of data collected by route-views6 (Route Views)
The first test (Figure 2) was run on the collector route-views6 of Route Views. This collector historically collects only IPv6 routes, and the amount of data collected during the month is not very large. In this scenario the RIB file size was 99MB and the sum of update files was 25.65GB, with 24 full routing tables (out of 26 tables collected).
BGP Scanner was able to complete the analysis in about three minutes while keeping the memory consumption under 3MB, while every other tool required at least 20 minutes, with a peak of more than 5 hours.
Figure 3: Tool performances during the analysis of data collected by Korriban (Isolario)
The second test (Figure 3) was run on the collector Korriban of the Isolario project. This collector is hosting feeders with ADD-PATH capability both in IPv4 and IPv6, which dramatically increase the amount of data collected. In this scenario, the RIB file size was 5.7GB and the sum of update files was 810.64GB, with 112 IPv4 full routing tables (out of 512 IPv4 tables collected) and 126 IPv6 full routing tables (out of 407 IPv6 tables collected). This test was run for only BGP Scanner and bgpdump, the only two tools currently supporting the ADD-PATH capability.
BGP Scanner was able to carry out the analysis in less than two hours while bgpdump required more than 11 hours. It can be assumed that this gap will likely increase in the future with more feeder information available and more data to be analysed.
How to get BGP Scanner
BGP Scanner and its basic building block, the BGP/MRT C library, are developed within the Isolario project and released as open-source under the BSD license.
The source code is available via Gitlab or the Isolario webpage and can be installed on any POSIX compliant platform. The utility comes with a detailed main page containing an in-depth description of its features.