We kicked off our RPKI Resiliency Project last May. In this update, find out how the project has progressed since then and what's coming up in the year ahead.
Every user of the RPKI system - whether they’re on the signing side, the validating side, or both - deserves to safely rely on our systems and must be able to trust that the data we publish is accurate and accessible.
Last May, my colleague Felipe Victolla Silveira introduced the RPKI Resiliency Project here on RIPE Labs, the goal of which has been to ensure an RPKI Trust Anchor and Certificate Authority that is secure, reliable and highly available. The project has been ongoing since, under a holistic approach with a distinct set of defined areas to work on.
RFC and Cryptographic Compliance
Over the past few years, a large number of RFCs have been published that relate to RPKI and cryptography. We’ve implemented each of these on our RPKI core infrastructure and Publication points.
To guard against ambiguity, since certain elements of RFCs can sometimes leave room for interpretation, we asked security company Radically Open Security to perform an assessment of our code. The aim was to get an independent measure of the degree to which we comply with the RFCs and to identify areas where we might need to improve. The code audit took place between 3 August and 25 September 2020.
In the conclusion of their assessment, Radically Open Security wrote:
“The code bases of RPKI-core and the publication server are of high quality. The code bases have been nicely compartmentalized in discrete logic units, tied together in established design patterns. RPKI-core adheres to interface-based programming patterns, which allows for this compartmentalization. There is always a risk in over-engineering a program using the object oriented paradigm, but this hardly seems the case regarding this project. Methods generally do only one thing, and do it well.“
They also added:
“The implementation of RPKI-core complies with the RPKI RFCs to a high degree, although some issues are present.“
While we’re glad that the results of the assessment were very positive, we are of course mainly interested in the issues that were flagged. Work is underway to implement a set of fixes that will resolve these. You can expect a full report of the findings and mitigations to be published during RIPE 82.
In addition to this, looking ahead to next year, we also plan to carry out the following:
- Crystal box penetration test
- Full evaluation of our Hardware Security Module (HSM)
- External security company to perform a Red Team test
- Further work on improving unit tests
- Further work on improving integration tests
Procedural and Operational Compliance
One of the things we have to constantly ask ourselves is: Is what we are doing the right way of doing it? And is our Trust Anchor and Certificate Authority secure? What needs to improve and to what extent?
There are hundreds of audit frameworks out there, but none of them are an exact fit for RPKI. We needed a well-recognised audit framework that both encompasses all important IT security elements and can be tailored towards the design principals and RFCs of RPKI. For this purpose, we’ve chosen to team up with The British Standards Institution to develop an RPKI audit framework that can potentially also be used by other Trust Anchors. The aim is to develop a SOC 2 Type II audit framework that takes into account security, availability, integrity and confidentiality. Such a framework would usually take into account privacy as well, but as there’s no sensitive private data in RPKI, we decided privacy is out of scope for the RPKI framework.
One of the good things about an SOC2 type II audit framework is that we can tailor it towards RPKI and then, once the audit is completed, we’ll be able to publish a SOC3 report about its findings. As always, such transparency is very important to us and the community.
As for timeframe, the framework itself will be finished early 2021 and we’re currently looking for another party to perform the actual audit later in 2021.
Certification Practice Statement
Every Certificate Authority (CA) should have a publicly available Certification Practice Statement (CPS). Our CPS is available as RIPE document 549, but this was published back in 2012 and it needed a complete re-write to reflect all the changes we made since then.
For example, in 2017, we added an additional CA; the “all resources” CA. This is currently not reflected in our CPS.
For the last five months, we have been working with multiple teams inside our organisation to update our CPS. It’s currently in review by our Legal team and will be followed by a review from our communications team. We expect to publish our new CPS under a different RIPE document number by the end of 2020 and we will work to keep it up to date in the future.
RPKI Repositories Resiliency
We’ve also been working on improving the redundancy of our repositories. In RPKI you have two types of repositories: rsync and RRDP. We maintain both. Currently, the rsync repository is hosted in-house and our RRDP repository is in the Amazon cloud (AWS). We are working on deploying our rsync repository in AWS, with a new, more redundant architecture. The goal is to have a very high uptime, as RPKI repositories are critical infrastructure. For next year, we aim to scale up our RRDP repository in AWS as well, following the same architecture as rsync.
Of course, we will also build a fallback scenario in-house, just in case AWS suffers a catastrophic failure. My colleague Felipe Victolla Silveira will publish a more detailed RIPE Labs article soon about the RPKI Repositories in the cloud.
Monitoring and Alerting
I also want to mention our efforts to improve our monitoring and alerting. After the outages we encountered in March and April, we learned that there was plenty of room for improvement in this area and we defined and added more and better metrics. For example, if we plan to delete a lot of ROAs because of an IP address transfer, we now get an alert that intervention is required.
Another example of an alert that users may be able to relate to are the resource cache updates. The resource cache contains a current view of the resources that members/certificate authorities hold. This is updated from our registry information system every few minutes. We want to make sure that this process is alive (so new members can create a CA, and CAs are updated for new resources), but also make sure that this information can only be updated when it’s safe to do so.
We gather the metrics with Prometheus and send alerts through Alertmanager. We visualise the data in Grafana. As you know, with monitoring you’re never really done, so we keep tuning and optimising.
In this overview you can see the estimation of the above-mentioned sections of the RPKI Resiliency Project.
As always, we look forward to hearing any feedback you might have on any of the above. Please reach out to us on firstname.lastname@example.org.