RIPE Atlas is constantly evolving, with the development team changing implementation on a regular basis. When proposed changes relate to behaviour, community input can be very helpful in choosing the best way forward. In this article, we look at proposals we have lined up and invite your feedback.
We’d like to present you with five proposals that were triggered by various discussions with our users as well as our own reflections on how the system currently behaves. We would strongly encourage members of the RIPE Atlas community – should you have an opinion on supporting, challenging, or changing these proposals – to let your voice be heard on the public RIPE Atlas mailing list.
Remove per-hop “late” packets from traceroute
One of the most low level RIPE Atlas measurements are traceroutes. These can contain indications about "late" packets - i.e. ones where a response packet arrives after the pre-set timeout. This property does not widely exist in other traceroute tools, and in many cases it is confusing to see these in the results - in particular what the responses are per traceroute hop.
Proposal and assessment
In an upcoming firmware version we will modify the traceroute measurement code and remove the “per hop” late packet indications. Instead, we’ll maintain a simple counter to indicate how many of these in total were observed while executing the particular traceroute.
We are not aware of researchers or network operators relying on this feature, so we believe the impact to be minimal / none.
- Traceroute results will be syntactically simpler, therefore easier to parse and interpret
Measure well-known CDNs
We’ve heard community members previously expressing their desire to execute HTTP measurements with RIPE Atlas. In many cases, this is meant to measure their connectivity to “well known” providers, or to check if their outgoing connectivity is working fine in general, or perhaps to measure performance over time. We believe many of these user requests can be fulfilled by curated, pre-defined HTTP measurements executed on all probes.
Proposal and assessment
- We will define HTTP measurements to be executed on all probes to be run once a day, or perhaps once an hour.
- These will target "well-known" major providers and CDNs, and fetch only a pre-defined, safe-enough content, namely the favicon.ico file. This is expected to be of very limited size (typically a few hundred bytes, up to a few kilobytes) and safe in terms of content.
- We will define an initial set of targets - e.g. the likes of Google (and/or GCP), Facebook, Twitter, Amazon (and/or AWS), Azure, Akamai, Cloudflare, Fastly, etc. We will invite other providers to be added later, maybe on an ongoing basis.
- The intention is to measure popular providers, not any-and-all websites. We will need to impose a hard limit on how many of these we include built into the system.
Most websites, when asked on HTTP, provide a redirect to the HTTPS version of the content; we believe this will be the case in basically all providers here. However, the HTTP answer itself is already useful in its own right. Handling the HTTPS variations (e.g. TLS, QUIC, …) can be evaluated separately at a later stage.
- Adding such built-in measurements addresses a lot of requests about HTTP measurements we heard so far.
- By targeting CDNs, we can provide comprehensive reachability and performance indicators to the vast majority of hosted content.
- They are considered safe for content and non-intrusive for traffic volumes.
- They facilitate future features answering questions about general health check of individual probe connections and correlated behaviour across larger populations.
- This feature provides benefits to all hosts, including ones that otherwise don’t see benefits of participating in low level network measurements.
- Increased bandwidth use for probes (though the change in amount is likely minimal).
- Possible arguments about which provider to include in this set and which to refuse.
Generic HTTP measurements
While the CDN-HTTP proposal covers a large number of HTTP measurement use cases, it does not allow for flexibility for the use of RIPE Atlas to measure any-and-all HTTP targets. Some users (including sponsors) expressed their desire to introduce HTTP measurements to their own custom targets.
The team behind RIPE Atlas has looked at this topic before. Until now we have controlled HTTP measurements due to the privacy and real-life risks involved for probe hosts if HTTP measurements would be allowed to be made to sensitive targets (from the perspective of the probe host). However, as Atlas is being used more and more to monitor the host's infrastructure, the need to create HTTP measurements is increasing.
Proposal and assessment
- Each probe host will be allowed to opt their probe in for HTTP measurements (likely via adding a probe tag). We’ll clearly explain the potential risks involved.
- Each user will be allowed to schedule HTTP measurements to target any HTTP server, but the involved probes must:
a. Be hosted by themselves OR
b. Have been opted in by their hosts
- Anchors will be treated like probes, i.e. they have to be opted in by their hosts first.
- The HTTP measurements will be limited to HEAD requests or GET requests with a really low, enforced body size limit (e.g. 1-4Kb).
- The measurement can only be a public one (see also: PUBLIC-MSMS).
- The protocol is limited to HTTP in the first instance. HTTPS, QUIC and others will be evaluated in later steps / proposals.
This proposal is an attempt to address the topic in a way that could be acceptable for the community at large. It is worth considering whether an extra step would be useful for the target of the measurements to opt in (e.g. via a DNS TXT record expressing “measure me”).
- This approach aims for a balance between safety for hosts and flexibility for measurers.
- This feature provides benefits to all (future) hosts, including ones that otherwise don’t see benefits of participating in low level network measurements.
- The feature increases the likelihood of sponsors stepping up and thereby helping the growth of RIPE Atlas, which is ultimately beneficial for the whole community.
- It may be that the amount of opted-in probes will stay low, limiting the usefulness of the feature.
- The opposite issue: the amount of HTTP measurements over time may be significant enough to warrant revision of our data retention policies. In particular if this feature becomes popular enough to dominate the result set, it’s possible that the data retention for HTTP measurement will have to be very different from how we handle other results.
- Malicious actors, even though they would need to register for the service explicitly, may execute measurements towards questionable content. The limitations (e.g. only HEAD, limited content, etc.) offer a limited mitigation for this.
Add support for STARTTLS measurements
RIPE Atlas has supported, for a long time now, a measurement type to fetch TLS certificates from websites and other servers. Some users feel that adding the possibility to measure STARTTLS support would be useful - and we feel these two types can be treated similarly.
Proposal and assessment
- We will either add a new measurement type for STARTTLS, or extend the existing TLS measurement with STARTTLS support. (We will decide on which is better by evaluating the pros/cons of these options.)
- Initially only up to TLS v1.2 will be supported
- The following protocols will have STARTTLS implemented initially
e. The various protocols would only be minimally implemented to support starting STARTTLS. No other protocol elements will be supported (e.g. attempting to log in, send email, fetch content, etc. are out of scope)
- The measurement will return the certificate (i.e. like the current TLS certificate check behaviour); Certificate chain validation is not included.
- Users will have to select the STARTTLS protocol to use when the measurement is created, this is not automatically recognised. A future version could default to a STARTTLS protocol based on the port number.
- Implementation does not trigger security / spam filtering infrastructure any more than a TCP ping or TLS measurement currently would
- The implementation allows for hosts to do rudimentary service monitoring
- Malicious actors would not obtain more information than would currently be possible with the TLS measurement
- No full validation is performed, such as DANE. Results of measurement would need further interpretation for this to be realised
- Some hosts would like to have a more thorough validation than this solution would offer
Remove support for non-public measurements
The RIPE Atlas platform is of greatest value to its users when measurement results can be shared. Most measurements run on the RIPE Atlas are public measurements.
RIPE Atlas allowed, from its inception, “non-public” measurements, i.e. a measurement where the results are not contained in the regular output channels. Arguments supporting this included the ability to test some infrastructure before it becomes widely available (or “production”) or cases where revealing the existence of the component the measurement targeted was not desired. We’ve also heard arguments that allowing this in RIPE Atlas was not a good choice to begin with, and therefore we should remove this feature.
Nevertheless, because of the special treatment needed for such results, the infrastructure contains complicated elements that are meant to filter the results in various ways.
At the moment, it is only possible to use this feature via the API; the UI does not support scheduling of non-public measurements. Changing this property after the measurement has been specified is only possible one way (non-public to public) and it is generally not really meaningful.
In 2020 we surveyed our users who ran at least one non-public measurement about how such a change would impact them. While some (single digit) respondents claimed they would probably stop using the platform altogether, the majority stated no impact or that they would “do things differently”.
Proposal and assessment
- At a pre-determined point in time our API will refuse requests to schedule non-public measurements.
- The already collected results will not be re-categorised, i.e. they will not be made public. In fact, after determining the complexity of this, we may even remove them from the data set after some grace period.
We believe if we really implement the change, doing it with the combination of the above steps follows the principle of least surprise.
However, we also note that some free / public services in network troubleshooting services rely on RIPE Atlas to “do the legwork” for them and these (at least ones we are aware of) are consciously using the nonpublic flag. Due to the number of indirections involved, it’s hard to assess the impact of such a change on the “end users” of such a service.
- Of all the measurements ever defined in the system, about 17.3% are non-public.
- In our “working set” (i.e. measurements active in the last couple of weeks):
- About 13.4% of the measurements are non-public. Almost all of these (98%) are one-off measurements.
- A total of 173 users scheduled at least one, 81 users have at least two, one specific user scheduled 91.5% of all of these.
- We are happy to provide further details on these numbers, if they are useful for the discussion.
- Some infrastructure components can be simplified.
- The risk of “leaking” non-public measurements in the public set is reduced to zero since there are no more results to leak.
- Users who were specifically using RIPE Atlas because of this feature will stop using the service.
- Other users may reduce / change their use of the service and perhaps ultimately disengage completely.
- As a result we could lose some connected probes, as their hosts no longer see value in keeping them connected.