Our draft cloud strategy framework is an attempt to bring everything together in a way that sets out some boundaries, identifies critical elements, and indicates where we need to be strict vs where we can afford to be a little more relaxed. This should hopefully support more clarity regarding how we are approaching the use of cloud providers and provide a solid basis for future discussions when we look at moving specific elements to the cloud.
We recently stepped up our engagement on the use of third-party cloud providers to support key RIPE NCC services. This was triggered by a stronger-than-expected response at RIPE 82 which, in hindsight, shouldn’t have been all that surprising. After the meeting, we recognised that we needed to start this discussion over, beginning with a summary of what we’d heard from you.
That summary formed the basis of a further discussion on the RIPE NCC Services Working Group mailing list, after which we said we would go away and draft a set of principles and a strategy framework to share with you in July.
Last week, we went over this with the WG in an interim session. The response there was positive, and we think we’re now ready to publish the full framework for your feedback – which is what the rest of this article will cover. So, with the backstory covered, let’s get started.
Principles and Requirements
One of the first things we did when taking a fresh look at this was to think in terms of the underlying principles. We ended up with the following list. We don’t expect these principles will seem particularly new to members of the RIPE community. This is really more of a restatement or a description of an implicit understanding that has existed for most of the time we’ve been working with you.
1. The RIPE NCC solicits input from the RIPE community for all services that 1) are critical for the operation of the global Internet, or 2) directly affect the operations of our members or the RIPE community.
Requirements for these services are discussed in an open community process with guidance from the appropriate RIPE working group. We publish implementation and deployment plans and seek input from the community from an early stage until successful deployment. We regularly report on the performance of our services and conduct reviews with the appropriate working group.
2. The RIPE NCC has full authority and responsibility for the design, deployment and operation of its services.
This is standard corporate governance. The RIPE NCC Association is a legal entity that assumes full responsibility for its actions and therefore needs authority regarding what it does. In an association like ours, this authority is granted by the membership and comes to us via the board it elects to provide oversight and direction to our staff.
3. The RIPE NCC must remain neutral
We have the responsibility to operate our services on a neutral and impartial basis for the benefit of all members, who are often in competition with one another.
4. Integrity of RIPE NCC services must be maintained
We are trusted by the Internet community to keep our services available in the face of geopolitical, economic and regulatory threats. We are accountable to the community to protect the security and integrity of the data and services we are entrusted to manage.
5. Open standards should be used
We will prefer open standards and open technologies. Where open standards are not viable, we will prefer industry standards over proprietary interfaces.
Based on what we heard from you over the course of our recent engagement, we identified a series of requirements that we need to meet in order to provide our services effectively.
1. Ensure resilience, accessibility, availability, and low latency for our services
This is a key requirement. Providing stable and effective services is a core function and we must be able to do this well.
2. Minimise vendor lock-in
The need to keep switching costs to a minimum was one of the most repeated concerns that we heard from you. As much as possible, we need to avoid becoming dependent on vendor-specific features or too deeply entangled in the proprietary environments of various providers. Preferring open standards and technologies can help us to achieve this.
3. Avoid dependence on any single cloud provider
We can’t rely on any single third-party to run mission-critical Internet infrastructure. We should favour a distributed architecture that avoids single points of failure and circular dependencies between the cloud infrastructure and RIPE NCC services.
4. Engineers can innovate and improve the quality of our services
While this may come as a surprise to some, the RIPE NCC is not made of magic and, just as with any other company, our resources are not infinite. We only have so many engineers and there are only so many hours in the day – making the best use of both to create value for our members and the community is important.
5. Comply with laws and regulations
There is not a lot of wiggle-room here. We have a strict vetting process to ensure that we comply with all applicable regulations, such as EU sanctions or GDPR. We should publish details around this vetting for the community.
6. Ensure the security of our services
This is another hard requirement. As with our legal compliance above, we should share details about our vetting process to support confidence and trust that we’re getting this right.
7. Prefer providers in our service region
We saw a strong preference from the community that we use local providers. This is something that we support, with the caveat that we need to consider this alongside any trade-offs in terms of the other requirements above.
Acknowledging Tensions and Trade-offs
Before we continue, it is important to recognise that as we seek to maximise the principles and requirements outlined above, there will be costs attached and trade-offs will be necessary. For example, an absolute requirement that we only use providers in our service region might come at a cost to our first requirement of ensuring resilience, accessibility, availability and low latency for our services. Similarly, maximising this first requirement of service quality might cost us in terms of avoiding dependence on a single cloud provider or avoiding vendor lock-in. There are tensions here as well, notably between the first two principles – #1 that we seek the community’s guidance while #2 having full authority over our services. This almost sounds like a contradiction, and this cloud discussion is a good example of how this can get out of balance.
The point is that we shouldn’t fool ourselves into thinking we can have everything or that things will always run smoothly. Instead, we should keep in mind that these trade-offs and tensions exist and discuss them openly and in good faith.
It’s in this context that a comment from last week’s WG session seems worth referencing here: while it’s good to define principles and requirements, we should take care to avoid painting ourselves into a corner. It is not in anyone’s interest if we end up choosing low-quality solutions simply because they are the best fit for the criteria we’ve agreed with the community. We should apply a measure of sanity and go back to you if something’s not working out or changes are needed.
Now that we’ve laid out some principles and requirements, let’s look at our draft strategy framework. It is important to be clear that this is still at an early stage and will be discussed further, both with the RIPE community and our Executive Board.
This framework is an attempt to bring everything together in a way that sets out some boundaries, identifies critical elements, and indicates where we need to be strict in terms of our requirements vs where we can afford to be a little more relaxed. This should hopefully also allow more clarity regarding how we are approaching the use of cloud providers and provide a solid basis for future discussions when we look at moving specific elements to the cloud.
To start, we have defined three different levels of strictness for each of the requirements we identified (Strict, Heightened and Standard). We then identified what each level means for each requirement, which you can see on the table below. Some requirements, such as ‘Comply with laws and regulations’ apply equally across all levels and so lack any differentiation (the final four rows on the table).
|Ensure resiliency, accessibility, availability and low latency of services||Uptime > 99,999%||Uptime > 99,9%||Uptime > 99%|
|Minimise vendor lock-in||Only use bare-metal or VMs||Managed services can be used but only with open standards||No restriction on managed services but keep track of switching costs|
|Cloud provider independence||
Fully distributed architecture
No downtime allowed
Stand-by backup infrastructure required
Fail-over within one hour
Ability to spin-off a new instance within 48 hours
Maximum outage of 48 hours
|Enable our engineers to improve product quality and innovate||Applies to all levels|
|Comply with laws and regulations||
Applies to all levels
Details of legal vetting process should be published
|Ensure security of our services||
Checks according to level
Details of infosec vetting should be published
|Prefer providers in our service region||Applies to all levels|
So far, we’ve described how we will interpret our requirements on a scale from strict to more-relaxed. The next step is to look at how we map specific services against this framework. Here, we have been thinking of our services in terms of two categories:
- Global Internet Services: required for the Internet to function properly (e.g. RPKI)
- Core RIPE NCC Services: critical for the RIPE NCC, but will not have a noticeable impact on the wider Internet if offline for a short period (e.g. LIR Portal)
Further, services within each of these categories can have differing levels of criticality (meaning, the importance of these services either to the operation of the Internet or the RIPE NCC). Looking at criticality, we have identified three levels:
- High: outages have a direct operational impact
- Medium: outages have an operational impact within a few hours
- Low: we can afford to be more forgiving regarding outages
The table below then indicates how we map the strictness of requirements above to specific services depending on their criticality. We have included examples of services to make this a little more concrete. It's important to note that these examples are merely illustrative at this point – we intend to define the criticality of specific services with you at a later stage as part of this work.
|Global Internet Services||Strict (RPKI)||Heightened (RIPE Database)||Standard (RIR statistics)|
|Core RIPE NCC Services||Heightened (Registry software)||Standard (LIR Portal)||Standard (Meeting registration software)|
Now that we’ve published this draft, the ball is back in your court. We hope what we have presented here will help the discussion to progress. With that in mind, please let us know what you think. Of course, detailed feedback is helpful – but so are brief expressions of support. We would like to hear from as many voices as possible. You can comment below or on the RIPE NCC Services Working Group mailing list (email@example.com).
The chairs of the working group have been kind enough to schedule a second interim WG session for 6 September where we can discuss this framework in more detail. Our senior management and engineers will be at the session to hear what you think and answer any questions. Following this, we will update our draft strategy based on your feedback, before we present it to our Executive Board at its meeting in September.
Until then, we hope to see you on the mailing list!