Delta-V Gets Take-off Speed

The adoption of the RPKI system is growing rapidly. To make sure the system scales, we’ve developed a new protocol that should drastically improve fetch times for RPKI repositories. This article explains how.

In the Resource Public Key Infrastructure (RPKI), Certification Authorities (CAs) publish their 'products' (certificates, manifests, Certificate Revocation Lists (CRLs) and Route Origin Authorisation (ROAs)) in repositories. These repositories are often shared with other CAs - for example all RIPE NCC members that use RPKI have their own hosted CA that publishes the manifests, CRLs and ROAs in the RIPE NCC repository. Validators such as the RIPE NCC RPKI Validator and Rcynic retrieve objects from these repositories regularly in order to perform validation.

What's the problem?

When this system was designed, there was a strong desire not to "over-engineer", and therefore rsync was chosen as the transfer protocol. Rsync is widely used and proven technology, and allows for both full synchronisations, as well as faster exchanges where just the updates are transmitted to a client (validator).

Rsync is optimised to reduce data transfer between the server (repository) and client (validator). However, this optimisation requires an investment in terms of CPU and memory in both the client and the server to work out what the list of changes, or delta as we'll call it below, is. This works very well for 'symmetric' scenarios where one client needs changes from one server, but in 'asymmetric' scenarios - such as many validators needing changes from one central repository - it turns out that the server can get swamped with load. Rsync is optimised for the first scenario, but it's pretty bad at the second.

The RIPE NCC realised there was a substantial ongoing issue with the use of ‘classic’ RPKI, and to achieve wide scale deployment of the current model would demand a better method. Accordingly, they began the design of a new protocol. The basic model of this new protocol is to send ‘delta’ data, rather than talk about the entire state of the repository.

How does it work?

In RPKI, the concept has now moved from a repository and a fetch of its current state, to a ‘snapshot’ in time of the state of a repository and a list of changes which have applied since the last snapshot. The entire repository still exists: the list of changes just points to the things that changed since the last snapshot. Because this is held as a single file, or a set of files for each successive change from a snapshot, it's much simpler to design a protocol to bring a local and remote repository into synchronisation:

If you have nothing, fetch the entire repository and record its snapshot serial (or date)
For subsequent changes, ask the other end for the list of delta files for each successive change from that snapshot, and apply them incrementing the serial (or date) to match the change

This mechanism fetches initially everything (slow) but then fetches just a catalogue of changes, a compressed set of changes, or at worst, the specific files which varied from last time (fast). If files have been removed, the change set can say “remove this file” and so nothing has to be fetched except the names to remove.

The RIPE NCC has specified this protocol in collaboration with the other RIRs and with Rob Austein from Dragon Research. They have also implemented a version of this specification in Java, to complement other implementations. This means that we have both ‘rough consensus’ amongst implementors of validating software systems in RPKI and running code in the form of these initial implementations.

Where do we go from here?

A good next stage will be to test the performance of this mechanism compared to Rsync over real load, and over long distance Internet connections. We can explore running parallel systems to see which one can converge faster on the current state of the RPKI information model.

The RIPE NCC code is running, and will be released into pilot service before the Prague IETF. Assuming there are no substantive changes to the design, it's likely this will become more widespread and the hope is all future repositories worldwide support this mechanism, and avoid the pitfalls of the Rsync fetch.

Background

Why Rsync?
When the RPKI system was being designed, the basic model of data distribution for cryptographically signed products was to use an LDAP directory. This is because X.509 certified products are almost always being used to prove identity and the primary qualities of the PKI certificates identity is the Holders Subject name and the Issuers Subject name. Since these are expressed using a notation derived from X.500 (a CCITT standard, part of the ITU’s global standards process) and are used as the names of LDAP objects, it's normal to expect the certificates and signed products to be placed into LDAP because the names reflect LDAP naming and distribution.

LDAP names are formed of components of named types, each of specified types. Elements of the names identify Organisation (O), Organisational Sub-Units (OU), Countries, (C) and personal names (common names or CN made from surname (S) and given name (G) and initial (I) and other elements) forms. So an LDAP name might be:

/C=AU/O=APNIC/OU=Research/CN=George Michaelson/

But in the Resource PKI (RPKI) we aren’t concerned with names, and we assign names based on random identifiers because we didn’t want to imply we were certifying names, because what we certify is the Internet Number Resources (INR) in the certificate, as a set of RFC3779 defined properties of the certificate. So an LDAP hierarchy didn’t seem to make much sense to us at the time: we didn’t care about the names, and it wasn’t a good “fit” for the information model we had in our minds. We decided not to use them.

Instead, we looked to existing protocols being used in the Internet to copy file hierarchies, directory systems between hosts. We settled on the Rsync protocol. This was an expedient, reflecting a desire not to ‘over engineer’ the system at this stage. Rsync is widely used, and has a public copy function which means it's logistically simple to run, and point an rsync daemon at a file hierarchy where you write products of your RPKI cryptography processes: the data model in RPKI is certificates, manifests, signed objects and CRLs (all files) written in a directory, one directory per RPKI instance. This is easy to imagine as a tree of files and directories, and to point an rsync service at the root directory of the tree. Voila! Problem solved.

Alas, like many other problems, all this did is create a future problem in solving a present one. Rsync is a hugely inefficient process to run, if you have more than a few changes in the tree of files: its basic model is to try very hard to only send the changed bits, but this requires the Rsync daemon to “walk” over the file directory hierarchy checking for changes, which requires it to talk to the rsync client, checking if it has an older copy (if anything at all). Consequently a huge amount of small back-and-forth dialogue takes place, and a lot of work is done on the server side walking the filesystem. If you have more than a few clients, the server can rapidly get swamped with load. On long slow Internet links, because Rsync does one thing at a time, you achieve a much slower end-to-end cycle of fetch-compare-skip before you even begin to transfer files.

At APNIC, we did some basic investigation of rsync and rapidly decided it wasn’t a good fit for the problem. And, at one level, “our job ended there” because in the research group we felt the design of routing security around a repository was flawed anyway, and we’ve begun exploring use of alternate models such as embedding the RPKI data in-line in BGP, and use of alternate validation models.

Why Delta?
A 'delta' is a fundamental concept often found in physics and mathematics, representing the amount of change a value is undergoing. For example, rate of change of speed is called delta-v and is acceleration: how much you are speeding up or slowing down. In computer science, delta-values are often associated with change in source code, as in source code repositories such as GitHub or RCS or CVS: the file is recorded in its head state, and a set of delta values are held showing how this changed from the prior state, all the way back to the beginning (older revision control systems such as SCCS held the original state, and a set of delta values to produce each subsequent new version which, for the head-version in a long list of changes, is more complicated to work with, so we now routinely hold the head version, and delta changes record how to go backwards). The delta being applied here to the repository takes a 'snapshot' view close to the current state and says 'apply these changes to the snapshot' and then you're at the same level as the head. Since the snapshot is the last known good place you synchronised to (based on your serial) all you get told is the set of differences from there, to the present time. It's always the smallest possible set of changes, far smaller than checking every element (as has to be done in rsync).