Some time ago we introduced RIPE IPmap. Now it’s time to take a look under the hood and see how the geolocation of Infrastructure IP addresses is done.
Earlier this year we announced the RIPE IPmap, a tool to geolocate Internet infrastructure. Besides relying on user input, RIPE IPmap uses various methods for inferring the geolocation of IP addresses on its own.
The RIPE IPmap portal is available at https://ipmap.ripe.net.
In this article we will look under the hood and see how this is done.
There are already numerous methods available for geolocating IP addresses, and there are likely many more to come with both business and academia continuing to work hard at finding new and improved solutions to the various problems geolocation poses. In developing our architecture, a key goal was to ensure that the system would be flexible enough to allow for the creation of novel geolocation methods, but also to incorporate geolocation methods created by third parties. In short, we wanted an approach that would be able to accommodate multiple approaches.
We called this the multi-approach - an approach to geolocation capable of drawing on multiple data sources and multiple methods.
Each geolocation approach is encapsulated in a single service we call a "geolocation engine". Each of these is completely isolated and has its own set of techniques for geolocation. Some can use active measurement, some reverse DNS, and some user input. Figure 1 depicts this architecture:
Figure 1: The multi-approach architecture
In the first stage of the process, each engine receives as input the same database of all the possible cities in the world and an IP address to geolocate. As output, each engine provides a set of possible locations for the provided IP address. Each location is annotated with a score from 1 to 10 (locations with 0 score are removed). The score assigned to each location indicates the engine's confidence that that is the correct location.
At the second stage, outputs from all engines are passed to the reducer, which combines scores for each of the locations it receives. As depicted in Figure 1, the score assigned to a location by the reducer is the sum of all scores received for that location from the various the engines. In the last step, the reducer returns a final list of locations sorted in descending order by their scores.
The architecture described above is easily accessible through an API.
To retrieve a list of possible locations for a given IP address, it's enough to append the IP address in the URL (for instance: https://ipmap.ripe.net/api/v1/locate/126.96.36.199/). The first location in the list is the one with the highest score; i.e. the most probable one. If you want just the location with the highest score and not the entire list, you can use the "best" filter (https://ipmap.ripe.net/api/v1/locate/188.8.131.52/best). The locations returned show the country and city according to the format described below.
As will be clear from the above, one of the key requirements for a geolocation service is that it has a complete geographical dataset.
To ensure that our system would meet this requirement, we created a unified format, enriched with geometrical and socio-economic information, used for fast and complex geographical queries. An example is shown in Figure 2. Our goal is to use this dataset internally for all our tools while promoting it externally to facilitate cooperation. The https://ipmap.ripe.net/api/v1/worlds API provides access to it.
Figure 2: Example of the format specifying a city
Examples of Geolocation Engines
There are various geolocation engines currently available in RIPE IPmap. Here are two examples.
The active geolocation engine uses the RIPE Atlas infrastructure for geolocating IP addresses with the help of ping measurements. The platform, with its 10k+ RIPE Atlas probes distributed worldwide, is instrumental in detecting the position of infrastructure IP addresses. This approach consists of converting latencies to distances based on propagation speed on the wires.
When the geolocation of an IP address is requested, we select some probes from the RIPE Atlas network which we believe to be topologically close to the target IP. We perform ping measurements towards the target and we collect the latencies below 10ms. Of those latencies we select the lowest one, this supposedly means the probe issuing the ping measurement is the closest one to the target.
When the closest probe is identified, we convert the latency to kilometres and return the list of cities in such radius from the probe (see Figure 3). If multiple cities are in the same radius, we score each of them based on the presence of IXPs and datacenters, because this means an increased density of infrastructure. This engine is called "single-radius".
Figure 3: An example of the active geolocation engine. The RIPE Atlas probe is the source of the ping measurement, the radius is the RTT.
A recent paper presented at IMC 2018 "Tracing Cross Border Web Tracking" (Costas Iordanou et al.), that received the distinguished paper award, reported a 99.58% accuracy at country level with this particular method.
The crowdsource geolocation engine is based on IP geolocation provided by our users. This is also implicitly done by operators and RIPE Atlas users in general while they troubleshoot their network using one of our visualisation tools like the IPmap UI and TraceMON. When they discover a host of the network geolocated in the wrong position, they can move it to the right place. The information collected that way is persisted and used for futures geolocation inferences.
The returned geolocation keeps into consideration how many times a specific location has been crowdsourced, how long ago and from which user. A user trust mechanism is in place. We are evaluating the accuracy of our algorithm while thinking about some user rewarding for crowdsourced information.
Please note that the geolocation system described here does not currently handle Anycast, but we are working on improvements that will make this possible in the future.
How to Contribute
We are looking forward to integrate more engines in our platform. If you worked on IP geolocation or you would like to contribute in any other way, please send an email to email@example.com .