OpenIPmap, the RIPE NCC’s tool for geolocating core Internet infrastructure, has undergone a full architectural overhaul. With a new set of APIs and interfaces designed to cater to the needs of all users, the system integrates various methods for estimating the geographical locations of IXPs, core routers and other components of Internet infrastructure. In this, the first of two RIPE Labs articles on the topic, we look at the evolution of OpenIPmap from its inception to today.
Please note that the name of this tool has been changed from OpenIPmap to RIPE IPmap.
OpenIPmap has existed as concept and prototype for quite some time. The concept and an early exploration of its potential was presented by Emile Aben back in 2013 at RIPE 67 and discussed in further detail on RIPE Labs. The idea, back then, was that nobody was making any real effort to produce accurate identifications of the geographic locations of Internet routers and other parts of the core Internet infrastructure.
Further investigation into the subject of geolocation for Internet infrastructure was then presented at RIPE 68, where it was first dubbed OpenIPmap. The resemblance with OpenStreetMap is - of course - not coincidental. OpenIPmap was conceived of as an open, crowd-sourced database of geographic locations. In the case of OpenStreetMap, this would be locations for physical objects, and for OpenIPmap, locations for core Internet infrastructure. Everybody was invited to participate in contributing to this database by selecting IP addresses from traceroutes from RIPE Atlas and submitting actual locations for them. From that point on, an online prototype was available.
Besides relying on user input, OpenIPmap had methods for inferring the geolocation of IP addresses on its own. It featured a method for extracting locations from reverse DNS hostnames and a validation technique based on speed-of-light calculations with roundtrip times of pings towards the geolocated target.
Since the launch of the prototype, the database has been steadily growing and scientific papers began mentioning OpenIPmap as a reliable, albeit small, source of geolocations. Several operators and other people from the Internet networking community have contributed to the dataset.
The Road to 1.0
In late 2016, the decision was made to promote OpenIPmap to an official 1.0 release. At the same time, the focus of the project widened to include more methods of acquiring geolocations for Internet infrastructure.
Due to the growth of the number of RIPE Atlas probes, it became feasible to extract plausible locations for a lot of infrastructure with the help of RIPE Atlas traceroutes. This created a shift in the balance between crowd-sourced input from users and locations inferred by other components of the OpenIPmap system. Now, rather than playing the role of providing new locations, crowd-sourcing would be relied upon instead to confirm or correct suggestions made by the OpenIPmap geolocation engine.
As awareness of the prototype gradually spread, the geolocation of Internet infrastructure was all the while becoming a hot topic for networking researchers. A lot of datasets and algorithms popped up that were either aimed at improving general geolocation acquisition or were trying to make headway in geolocating IP addresses that was difficult to acquire earlier on, e.g. anycast addresses.
A New Architecture
With all these new datasets and algorithms available, we made the decision to come up with a totally new architecture for OpenIPmap. We settled on a strict sub-division of each method for geolocating IP addresses. Crowd-sourcing would be one of these methods, but the broader goal was to be be able to accommodate as many geolocation methods as possible into the OpenIPmap system.
In the current system, methods are referred to as geolocation engines. In order to arrive at a geolocation estimate, the outcomes from the various engines are assigned a weight (on the basis of the engine's expected reliability) and turned into one preferred geolocation for an IP address. This weighting mechanism is called the score reducer.
This architecture has the benefit of allowing datasets and algorithms to be added or removed as geolocation engines, integrating their results via the score reducer. On the other hand, it allows the contribution of any geolocation engine to be tuned to the preferred geolocation and provide feedback for the performance of an engine relative to other engines. This feedback could be used by the creators of the underlying datasets/algorithms to refine their heuristics, which in turn may lead to better OpenIPmap results, and so on.
Another consequence of the strict division of OpenIPmap into geolocation engines and a score reducer is that it aligns all datasets and algorithms into using the same data structures. For instance, the system effectively imposes a common format for describing geographical locations. Although OpenIPmap could in principle use any type of geographical area for locations, it currently concentrates on what we deemed the most useful: city or country precision. This means that users and engines can select an IP address to be situated in a city or in a country. However - incredible as it may seem - there is no widely adopted way to store an actual city. Some datasets use city names from geonames.org (mainly the English language name of the city), others use the geonames.org ‘id’ field (a number), others use location descriptions that rely on third-party services, like google geolocation, and, lastly, some datasets use latitude and longitude data directly.
OpenIPmap uses a combination of city names available from geonames, combined with a region ID, country ID and a geohash for readability, to avoid name collisions (e.g., there are at least five Springfields in the USA, two of which are in the same state) and gives information about the precision of the location. You can start making your own contribution right away.
Interfaces to OpenIPmap
Of course, a tool is only as good as its interfaces to the world, and since OpenIPmap is (also) a crowd-source tool, we tried to come up with several ways to interact with the system.
First and foremost, there's the OpenIPmap API that allows for the retrieval of geolocation information for IP addresses and for the submission of new IP address and geolocation sets.
Second, there is the web application on ipmap.ripe.net that allows users to browse the geolocation engines and crowd-source geolocation data.
Third, a user interface is integrated with RIPE Atlas for each traceroute measurement that allows for viewing traceroutes on a map with geographic locations from RIPE Atlas and allowing for users to add new locations or submit locations for existing ones. Look at any traceroute measurement in RIPE Atlas - the overwhelming majority is public - and click on the tab OpenIPmap.
How to Contribute to OpenIPmap
You can contribute to OpenIPmap in various ways. Head over to ipmap.ripe.net or a RIPE Atlas measurement to fill in the blanks, confirm existing locations you know are right, or improve ones that you know to be wrong. Also, bringing the precision of an IP address up from country level to city level is a valuable thing to do.
We are looking forward to integrating more engines into our platform. If you are a researcher and you have a dataset, a script, an application or a database that does IP address to geolocation in any shape, then send an email to email@example.com.
Look out for the next article on OpenIPmap where we'll be going into more detail on the collaboration involved in OpenIPmap, integration of the various geolocation engines, and the APIs.