This year's Internet Measurement Conference (IMC) was held in Amsterdam from 21-23 October. In this article we highlight some of the presented work that we think is interesting and that the RIPE community might find useful.
This year's version of the IMC conference presented 29 long papers (each around 13 pages), 10 short papers (each around six pages) and 26 posters. IMC is a key academic venue for presenting Internet measurement research, and the quality of the papers is generally very good. Many of the results provide interesting insights that the operational or standards communities can use in their work. As in previous years, the data we collect at the RIPE NCC with projects like the Routing Information Service (RIPE RIS) and RIPE Atlas was a key part in some of the studies here.
The IMC 2019 venue (photo credits: Mattijs Jonker)
The programme itself also provides some hints as to where researchers are focussing their efforts. For instance, there were two (!) sessions dedicated to DNS, a session on privacy and one session was about the advertisement ecosystem. As advertisements are a big driver of investments in Internet infrastructure, it makes sense that this is part of an Internet measurement conference. On the other hand, it is still amazing (and worrisome?) that this has become an ecosystem in itself that people find worth studying.
Like before, the quality of the papers presented was generally very high. We will not summarise the whole of the conference but instead cherry pick the papers that stuck in our minds. All papers, their abstracts and the presenter's slides are available online:
- Encryption of DNS traffic has received a lot of attention recently, first provoked by concerns around pervasive surveillance, then by concerns around the appropriate response to that. Two complementary papers on the performance of DNS over encryption were presented:
- An Empirical Study of the Cost of DNS-over-HTTPS specifically studies protocol-level performance from the perspective of clients. They find that in terms of network performance, DNS-over-TLS and DNS-over-HTTP/1.1 both suffer from head-of-line blocking issues on TCP, but DNS-over-HTTP/2.0 doesn't have the same problem (probably unsurprisingly; indeed, HTTP/2.0 is recommended in the DoH RFC).
- An End-to-End, Large-Scale Measurement of DNS-over-Encryption takes a broader view on the prevalence of encrypted DNS in general. They use multiple approaches including IPv4-scans, client-side measurements, and passive traffic monitoring to understand real-world DoE usage. They locate around 2,000 open DoT servers in the wild, but geographic diversity is not high: 951 of these appear to be in Ireland, and 531 in the USA. In addition, 25% of the servers identified use invalid TLS certs. They find only 17 open DNS-over-HTTPS servers. Further, with regards to reachability, they resurface an old problem: that 22.214.171.124 can often be hijacked by older devices on-path, though only specific ports are problematic.
Serial Hijackers: This research seeks to identify serial BGP hijackers by finding differentiating characteristics in how networks announce network prefixes over time. They found that announcements done by hijackers are much more volatile, with some interesting examples like BitCanal. With their method they found ~900 ASNs that have characteristics similar to their ground truth of serial hijackers. Will this start a sort of arms race where hijackers become more stealthy about their network footprint (or did it already start?)?
- Scanning the scanners: This research takes an interesting slant on understanding IPv4-wide scans, leveraging a typical CDN architecture: on edge nodes, typically only specific addresses are publicly exposed via the DNS and other addresses that are not advertised are used internally (but are still externally reachable). Using the key observation that legitimate traffic will only use the advertised address and address space scanners (broad, or targetted on particular networks) will naturally reach the addresses that are not advertised, they can model scanner traffic similar to how traditional darknets work but distributed across over 1,000 networks. They measure their baseline "radiation" on their servers to be around 3,000 packets per day, from sources that reach both the advertised and the non-advertised addresses. They compare aspects of their traffic with the UCSD /8 darknet and observe similar target port numbers for broad address sweeps, but targetted scans show slightly different characteristics.
Mastodon: This research was interesting because it is trying to measure Mastodon, a decentralised version of Twitter, and the researchers have collected a great dataset on this. The accompanying presentation focused on the fact that 'decentralisation is tough'. Various aspects of these distributed system are not very distributed: For instance most servers are hosted in just a few ASNs, like Amazon and Cloudflare. Despite this, I think the data also showed how for a platform like this there is a very long tail, which is widely distributed and evolvable.
- Privacy exposure by IoT devices: This analysis, which you might also remember from the RACI presentation at RIPE 79's IoT WG session, found that unencrypted traffic from IoT devices, while in the minority of all traffic, exposed substantial information via plaintext. Sadly, even when devices use encryption, an eavesdropper can consistently profile users by observing the timing patterns of their network traffic. Even more worryingly, they found several instances of devices unexpectedly sending audio or video. How many IoT devices do you have at home?
- Facebook performance: As a major source and sink of Internet packets it's interesting to see how Facebook is measuring things. Their size makes it that they can overload their transits if they are not careful, for them peering is a way to 'get rid' of the traffic without congestion. They massively measure using roundtrip-times in TCP sessions with the clients.
- BBR-CUBIC interaction: Not directly inter-domain routing related, but a fascinating peek into TCP congestion control. This research measures and then models how BBR-based and CUBIC-based TCP congestion control algorithms interact. The unfairness of BBR is clearly explained in this presentation with examples featuring the Obama family.
- RPKI over time: This work looks at the evolution of RPKI over time, using both BGP-collector data and an archive of daily snapshots of the public RPKI repositories (we make this data available at: ftp.ripe.net/rpki). The work details various misconfigurations and concludes that this has dramatically reduced over time, resulting in RPKI being ready for prime time.
We were impressed by the quality of the papers presented at IMC this year and congratulate the authors on their research. Through initiatives such as RACI, which funds academics to attend our meetings, the RIPE NCC tries to showcase such amazing work to the RIPE community (as well as generally encouraging dialogue between researchers and practitioners). For those of you reading this far: a reminder that the RACI call for presentations for our meetings in Spring 2020 is now open. If you, your colleagues or students are working on something interesting, we would love to hear from you!
We try to cover IMC most years. We have notes from the last couple of years at: