Bias in Internet Measurement Infrastructure
This article aims to raise awareness about bias in the Internet measurement infrastructure; e.g., RIPE Atlas, RIPE RIS. What is bias? Is the infrastructure biased? Is it a problem? The article discusses these issues and presents data that can help users to interpret or conduct measurements.
“Unless I'm wrong, the bias for RIPE Atlas probes is measured by the number of probes in the AS. Isn't there also a bias when asking N probes for a measurement, without specifying area/country/AS? Are we guaranteed that the set of probes we obtain respects the general population of probes? Or is there an extra bias here?”
Thanks Stéphane for bringing this up! The bias for RIPE Atlas shown in the plots corresponds to the entire set of RIPE Atlas probes (i.e., if someone did measurements using all the 11k probes, he/she would have this bias). When asking only N probes for measurements, which is what happens in practice, the bias we have is higher. The smaller the number of probes N, the higher the bias. You can see some preliminary results/visuals on this (i.e., number of probes N vs bias) at this readme page https://github.com/sermpezis/ai4netmon/blob/main/use_cases/bias_in_monitoring_infrastructure/Bias_vs_sampling.md on our project's github
Thanks @Gergana and @RACI! For those interested in the AI4NetMon project, more info can be found in the project website https://sermpezis.github.io/ai4netmon/ . Feel free to contact us for more info or collaboration
Showing 2 comment(s)