How to Define Address Space as 'routed' ?
When data from RIR allocation statistics are compared with data from route collecting systems such as RIS, the question often comes up when a particular address block can be considered to have been routed globally. A straightforward answer would be to look for a prefix in RIS which corresponds to or includes as more specific the requested address space; any match found would be taken as evidence of (global) visibility in the routing system. However, because RIS has a variety of peers, some providing a full routing table, others only passing internal customer routes, the mere fact that a prefix is seen in RIS doesn't tell us much about visibility. For all we know, the matching prefix could be a configuration error, a hijack, or only used locally (exchange point LANs are good examples). Therefore, we need to look into more detail how many peers provided RIS with a route for the prefix and how many RIS route collectors carried the route to make a decision on whether or not to include the prefix in the analysis.
The purpose of this document is to come up with a definition of "globally" routed prefixes which can be used in any analysis of RIS data, both recent and historic. The definition does not have to be perfect, nor can it be perfect; we will always have border cases where a prefix is partially visible, i.e. seen by a good number of peers but not by as many peers as one would expect for real global visibility. Whether such prefixes should be considered routed or not is an open question. However, as long as the total fraction of such prefixes is small (less than 1%) it will not matter much to any statistical analysis.
A detailed look
Using the INRDB as an easy means to obtain the full RIB dumps for any date in RIS history, we derived distributions of number of prefixes and number of peers, both for individual route collectors and aggregated over the entire RIS. Results are shown below for a few selected dates. The first graphs show the frequency distribution of prefixes per number of peers aggregated over all route collectors; i.e. for a given date the graphs show how many prefixes were seen by a particular number of peers.
As expected, the distributions usually are bi-modal in nature, something which becomes more prominent as time passes by and the number of full-feed RIS collector peers grows. On the left end we have a significant number of prefixes which are seen by only a hand-full of peers; these are the more specific customer routes which are passed to RIS and some other peering partners, but are not meant to be exported. Typically, RIS receives these from the originator and some additional peers at the same exchange point. On the right side of the distribution we see another sharp peak followed by a longer tail. These are the prefixes which truely are widely seen in the RIS. Both full-feed peers and peers with a partial feed contribute to this part of the distribution. Between the left and right distributions we have an area which, usually, is sparsely populated.
Occasionally, the prefix distribution shows three distinct peaks, as shown below for 21-Sep-2009. We haven't looked yet into the details why this happened; at this stage the important thing to note is that it does happen, distributions do not always follow the expected straightforward model.
To check our interpretation of the reach of the prefixes in the different parts of the disctributions, we also created scatter plots of the number of peers vs. number of route collectors. If the prefixes in the left part of the distribution are indeed customer routes, we expect to see them at only 1 or 2 exchange points. The prefixes in the right part of the distribution should however be seen by most if not all route collectors. As we can see below, the graphs confirm this hypothesis.
Prefix scatter plots. On the x-axis we have the number of peers seeing a prefix, on the y-axis the number of route-collectors. The histograms on top and to the right collapse the data to one dimension, thus showing frequency distributions of number of peers and number of route collectors. The plot for 2009 most clearly shows how non-global prefixes dominantly are seen by only one route collector.
A workable definition
The results above show that a definition of globally routed prefixes as seen by RIS does not have to take into account the number of route collectors (each at different geographical location) which a prefix was seen by. It is enough to check the number of collector peers only.
The results also show we basically have two choices: either a gentle filter, with a cut-off after the first peak, or an aggressive filter with the cut-off just before the second peak. The first would filter out the vast majority of non-global prefixes, the second would select only those prefixes which are global beyond doubt. However, the drawback of the second is that the cut-off value in number of peers is highly dynamic, it will change over time and can even be different from one RIB dump to another. For the first filter on the other hand, the graphs show a static cut off value of 10 peers would work for all of RIS hisotry. Since our goal is to come up with a definition which can be applied easily in any RIS analysis whether it looks at a single point in time or at the a larger set of historic data, we favour the first, less agressive, definition:
a prefix is considered "routed" when seen by at least ten RIS collector peers .
Note that this is the definition used by REX , the Resource Explainer.