The Internet Number Resource Database (INRDB) is ready to become the new storage back-end for the Routing Information Service (RIS). All RIS data ever collected will soon be available and searchable online from the same interface.
We've been working to evolve the Internet Number Resource Database (INRDB) since we first launched it last year. The INRDB prototype, written in C, was developed in an attempt to demonstrate an efficient universal storage solution for all of our large Internet resource related data-sets (for example data from the Routing Information Service (RIS) and from the other Regional Internet Registries (RIRs).
Today, the INRDB is ready to become the back-end infrastructure for the RIS. With that, it will be possible to make all data that was ever collected since the service was first launched in 2000 available online.
The INRDB and the RIS
The RIS relies on a network that is currently comprised of 15 software routers, called Remote Route Collectors (RRC) . The RRCs are located at various locations around the globe across three continents (Europe, America and Asia). From there, they collect Border Gateway Protocol (BGP) routing data from over 600 peers. This data, initially in its raw format, is processed, stored and made publicly available to the Internet community.
More than ten years of collected BGP routing data is available in its raw format . However, until recently, only three months of data was searchable using the existing RIS tools, which relied on a MySQL back-end. This was a shortcoming in terms of availability, and resulted in the reduced ability to examine historical data - unless one re-processed the RIS raw data. In addition, as the Internet grows, we continuously have to take into account fundamental factors such as scalability and performance.
The INRDB is providing significant improvements to the RIS. All of the collected RIS data will now be available online in a sophisticated database with many interesting capabilities.
NetSense is one of the RIS client applications that provide access to the RIS data. It is the first to be ported to the INRDB.
The NetSense Interface
It shows the various locations where the RRCs are located. Its primary function is to provide network operators with valuable information about the stability, visibility, and routing consistency of their ASes and announced prefixes. Without the INRDB, NetSense only provides users with access to the most recent three months of RIS data. With the deployment of the new INRDB, users will be able to examine up to ten years of historical RIS data immediately online.
The INRDB and the RIS API
All of our client applications request RIS data via an internal API which provides us with standardised access to the RIS back-end. The RIS API has been updated and optimised to work with the INRDB.
We had two primary goals when updating the RIS API:
- Enable communication with the INRDB back-end, and;
- Maintain access to the RIS data without breaking other existing client applications
Technical Information about the INRDB
The INRDB is written in Java version 1.6 as a distributed storage back-end that makes use of the Apache Hadoop framework and its sub-projects. We currently use Cloudera’s distribution of Hadoop (CDH3) . Hadoop allows for building reliable, highly-scalable, distributed applications.
Applications that make use of Hadoop, run on top of the Hadoop Distributed File System (HDFS) that comes with Hadoop. HDFS is highly-configurable and enables rapid access to information stored across multiple computing nodes.
Hadoop also includes the powerful MapReduce framework . This mechanism allows for rapid and efficient manipulation of huge data-sets across computer clusters. It does so by running distributed, parallel jobs that process data by applying Map and Reduce algorithms.
HBase is a non-relational database running on top of Hadoop and is used as the INRDB storage back-end. It supports storing vast amounts of data in very large tables, while simultaneously offering real-time read/write access to them. The initial storage capacity for the INRDB is 48TB, which is required for HBase to reliably store data by using a data replication factor.
The last element, Apache Zookeeper , is a mechanism which is responsible for synchronising and coordinating between worker machines in distributed, multi-node cluster set-ups. It addresses and solves common problems such as worker election, grouping and naming, and it tries to guarantee high availability, reliability and fault tolerance.
Our custom INRDB importers initiate parallel MapReduced jobs which process and insert RIS data into HBase. The same importers make use of a custom library ( libbgpdump ) to parse the RIS raw data.
Our experiences using the Hadoop framework are very promising so far. We have observed significant performance improvements in data insertion using MapReduce. Three months of RIS data can now be processed and inserted into the INRDB in 24 hours. This represents a 60x improvement over the MySQL solution used before.
In summary, the key improvements gained from migrating the RIS onto INRDB are:
• Greater availability of historic data
• Ten years of RIS data becomes live and searchable online
• Improved performance with parallel processing and insertion of data
• Highly and transparently scalable
• Easily expandable by adding worker nodes and storage
• Adding new RRCs to the RIS network becomes easier
• Integrity of RIS data is ensured with replication and other HDFS features
Co nclusion and the future of INRDB
All of RIS collected data will soon be available online, including ten years of historical data. The initial vision of the INRDB was to have a unified Internet Resource Database for all Internet resource related data maintained by the RIPE NCC. With this new implementation of the INRDB, we are one step closer to this vision. We will continue pursuing this goal until the INRDB becomes the RIPE NCC central Internet Number Resource Database.