Now There's an Idea - Research Pitches from Our R&D Team

One of the tasks of the R&D team here at the RIPE NCC is to come up with, evaluate, and execute new ideas for research topics and tools that can be of good use to the community. In this article, we're starting an experiment to expose some of these ideas to the community, in order to find interested parties to (help us) develop them further.

The RIPE NCC has a long tradition of providing the community with useful data analyses and tools. That said, as is almost always the case with everyone everywhere, we have more ideas than time and resources to work on them...

So, in this article, we thought we'd get some of our ideas out there in the hope that external parties might find them interesting enough to want to help put them into action. In particular, for research ideas, we're keen to see if there are students out there who'd be interested in developing these into theses or papers. It's also possible that some of the tools and ideas listed below might prove popular enough to warrant us spending energy on them -- if so, let us know!

In case you're interested in working on one of the ideas listed here, then please let us know by sending us an email to ncc@ripe.net -- and mention the ID and contact person for the item.

RIPE Atlas Anchor Auto-Monitoring

ID: IDEA-2021-01
Type: tool
Difficulty: medium/hard
Contact: Johan ter Beest

RIPE Atlas anchors are hosted by interested network providers. These anchors are automatically measured by the system: all of them are in a "full measurement mesh" - i.e. measurements are scheduled on all of them to target each other, as well as a number of other probes are assigned to target them. This provides a continuous flow of ping and traceroute results that provide a good source of real-time reachability information.

A tool could be be developed that:

Takes one or more anchor IDs as input
Using the available APIs, self-configures to detect what measurements are relevant for this anchor
Connects to our data provision infrastructure - preferably the result stream - to continuously fetch relevant results
Stores the data as needed locally, in order to power a number of visualisations/notifications:
- Graphs of various kinds (latencies, reachability information, ...) preferably updated in real time
- Maps of various kinds preferably updated in real time
- Configurable notifications to the host if various thresholds are crossed
It should be deployable in an easy-to-use format such as a docker container, or perhaps a VM appliance, embedding all the components needed to make the tool self-contained – ie. making the installation as easy as possible

We note that some network operators have already built systems (for themselves perhaps) along these lines, so some cooperation may be possible.

Automatically Detecting Correlated RIPE Atlas Probe Behaviour

ID: IDEA-2021-02
Type: research/prototype
Difficulty: hard
Contact: Emile Aben, Stephen Strowes

In the past, the RIPE NCC analysed and illustrated various outages by using RIPE Atlas probe data, in particular their metadata about connected / disconnected status. Some examples:

The above examples were mostly based on the fact that we knew about the existence of a plausible signal, and indeed found evidence of that in our data. In this proposal, the relation is the reverse: the task is to automatically detect correlated probe behaviour, for example if most (or all) the probes in a particular AS/prefix or city/country show a similarly changing behaviour.

The service could perhaps:

Maintain running graphs (or is able to produce them upon detecting an event)
A log of detected events and their associated confidence levels
Even better: send out notifications when it detects events

It can use connection/disconnection data or – for bonus points – RTT measurement results and should have various knobs to tweak its sensitivity and other parameters.

Is this a question of statistics? Or machine learning? You tell us!

We would also like to point to related previous research, for example "Statistical Characterisation of RTT Series" by Maxime Mouchet.

Correlating RIPE Atlas Data with other Datasets

ID: IDEA-2021-03
Type: research
Difficulty: medium
Contact: Stephen Strowes

RIPE Atlas data is available in Google BigQuery. We're inviting researchers and other interested parties to correlate our data with other data sets to look for evidence of statistical correlation or potential for making stronger connections between measurement systems and their published data.

Other datasets are published on BigQuery, including:

M-Lab network measurement data
Google Public Datasets, including (for example) weather and storm reports, hurricane tracks, tsunami reports
Soon, our own tables with probe metadata (e.g., probe locations, ASNs), and some BGP state to help mapping IPs to ASNs.

We are sure other datasets exist, public or otherwise, that could correlate with the network measurement data collected by RIPE Atlas. This could be extremely open-ended, but can be summarised as: are you able to identify and correlate events in the RIPE Atlas data with other datasets?

Checking Locality of Well-Known Services

ID: IDEA-2021-04
Type: analysis
Difficulty: medium
Contact: Emile Aben

All RIPE Atlas periodically look up the addresses using their DNS resolvers for DNS names such as google.com or facebook.com. The results vary from location to location, because these services usually try to serve content from a location that is close to the user. The task is to:

Check the correlation between where the probes are located and what the answers are; i.e. whether they correlate to the same country, same AS, same prefix, etc.
Seek out outliers; i.e. where the probe's geolocation (set by the host) is not consistent with the results observed for such queries – indicating a possible probe mis-geolocation case
For bonus points: use active measurements to check if the locations the probes are directed to are the optimal ones; i.e. whether a different service endpoint would be (significantly) better to use for that probe

Note: a similar analysis was done in 2014 by Emile Aben about the methods used then by Wikimedia.

Where are we Missing RIPE Atlas Probes?

ID: IDEA-2021-05
Type: analysis
Difficulty: medium/hard
Contact: Emile Aben

To be able to support as many use-cases as possible, it would be great to have RIPE Atlas probes in as many diverse places as possible. We've made an interface that shows if there are RIPE Atlas probes in "eyeball" networks, but this only covers "eyeball" networks. Can we find the places where RIPE Atlas probes are missing? Do we need probes near places where a lot of interconnection is taking place? Can we use peeringDB in RIPE Atlas probe placement?

Where are We Missing RIS Vantage Points?

ID: IDEA-2021-06
Type: analysis
Difficulty: hard
Contact: Emile Aben

To be able to support as many use-cases as possible, it would be great to have RIS peers in as many diverse places as possible. Can we make priority lists of networks (and places?) that would improve our coverage of the BGP routing system the most? What data sources can help us guide this? PeeringDB?

ASN/IP Dispersion Graphs/Statistics

ID: IDEA-2021-07
Type: analysis + visualisation
Difficulty: medium/hard
Contact: Stephen Strowes

Every probe exists inside a network that has one or more upstream networks; every probe collects topology information from traceroute measurements.

An interesting/useful visualisation for probe hosts would be upstream dispersion graphs: if you go digging, CAIDA generates something for Ark monitors (examples here).

The traceroute data would require some normalisation to make the visualisation meaningful. If 50 traceroutes run to one target and they follow upstream A, and 100 traceroutes run to another target and they follow upstream B, it is probably not the case that the host network prefers upstream B ~66.7% of the time. Important outputs are

A good visualisation of upstreams reaching further away from the probe, and
...is in a form that is meaningful and suitable to display on probe pages

Improving IPv6 Topology Measurements

ID: IDEA-2021-08
Type: analysis
Difficulty: medium/hard
Contact: Stephen Strowes

We run low-frequency IPv6 topology measurements in built-in measurements IDs 6052 and 6152. These take all prefixes announced in BGP, and target the ::1 address in each. This has some shortcomings: IPv6 prefixes are very large, and may miss covering prefixes if the ::1 address matches on a longer prefix.

A couple of years ago, we published on aggregating IPv6 "hitlists", which collect active targets from multiple data sources. Measurements 24304869 and 24304870 have been using these hitlists for some time, but there is analysis to be done.

For instance:

How much more likely are the hitlist targets to respond to a traceroute measurement than the ::1 addresses?
How much more of the network do the hitlist targets reveal?
How much more of the routing table appears to be covered when using one measurement set or the other?
Is a hitlist more or less useful than the ::1 list?
What other approaches are now out there for measuring the topology of the IPv6 network?

There's Always More...

We're curious about how successful this call for participation will be -- so this approach itself is a study 🙂

We'll of course evaluate the reactions and results in order to determine if this is a useful approach or not. We'll also evaluate whether it is useful to extend the idea into an "idea marketplace" where the RIPE NCC is not an exclusive source of such ideas. Please leave a comment below if you have an opinion on this!