Felipe Victolla Silveira

RIPE NCC Cloud Strategy Framework

Felipe Victolla Silveira
Contributors: Kaveh Ranjbar, Razvan C. Oprea, Vesna Manojlovic, Daniel Karrenberg, Fergal Cunningham, Antony Gollan
0

Our cloud strategy framework provides a starting point that we will use when developing cloud implementations in the future. It also forms a solid basis for discussions with the community on specific proposals relating to our services.


This August, we shared a draft version of our cloud strategy here on RIPE Labs. Following that, we saw further discussion of the topic on the RIPE NCC Services Working Group mailing list and we also held two interim WG sessions - on 28 July and 6 September - where the community was invited to give input on our draft framework. After receiving positive feedback there, we made some final edits to produce a version that was approved by our Executive Board at its meeting in late September.

Here, we share with you this final version of our cloud strategy framework. As you'll see, it outlines a set of underlying principles and requirements and describes a framework that we will use as a starting point when developing any cloud implementations going forwards. We think this will provide a good starting point for any discussions on specific proposals for our services.

Content-wise, everything here is consistent with our previous draft, only the text has been cut down to make things more concise and straightforward. It's also worth noting that there is one part still outstanding, which is to agree on the criticality of our various services with the community. This is in the works and we aim to share something soon.

Cloud Principles and Requirements

Principles

Shaping this strategy are a series of basic principles regarding how we work with the RIPE community:

  • We solicit input on all services that are critical for the operation of the global Internet, or that directly affect the operations of our members or the RIPE community.
  • We have full authority and responsibility for the design, deployment, and operation of our services.
  • We must remain neutral and operate our services for the benefit of all members.
  • The integrity of our services and data must be maintained in the face of geopolitical, regulatory, and economic threats.
  • Open standards should be used; where open standards are not viable, we should prefer industry standards over proprietary interfaces.

Requirements

Taking the principles above, and using input from the RIPE community, we have identified the following set of requirements for our use of cloud providers:

  • Ensure resilience, accessibility, availability and low latency for our services.
  • Minimise vendor lock-in.
  • Avoid dependence on any single cloud provider.
  • Our engineers can innovate and improve the quality of our services.
  • Comply with applicable laws and regulations.
  • Ensure the security of our services.
  • Prefer providers in our service region.

It is important to recognise that there are tensions between some of our principles and requirements, and trade-offs will be necessary. It is not worth choosing low-quality solutions simply because they are the best fit for this framework – we will apply sanity and engage with the community if something is not working.

Cloud Strategy Framework

Our cloud strategy framework uses the principles and requirements outlined above to set boundaries and identify where we need to be strict and where we can be more relaxed. This provides clarity about how we will approach the use of cloud providers and supports future discussions with the community when we look at moving specific elements to the cloud.

Requirements According to Strictness

We have defined three levels of strictness for each of our requirements (Strict, Heightened and Standard). What these levels mean for each requirement is outlined in the table below. Some requirements, such as ‘Comply with applicable laws and regulations’, apply equally across all levels and so lack any differentiation (the final four rows on the table).

Table 1: Requirements According to Level of Strictness
Requirement Strict Heightened Standard
Ensure resilience, accessibility, availability and low latency of services > 99,999% availability* > 99,9% availability > 99% availability
Minimise vendor lock-in Only use bare-metal, VMs or containers** Managed services can be used but only with open standards No restriction on managed services but keep track of switching costs
Avoid dependence on any single cloud provider Fully distributed architecture
No downtime allowed
Stand-by backup infrastructure required
Fail-over within one hour
Ability to spin-off a new instance within 48 hours
Maximum outage of 48 hours
Engineers can innovate and improve our services Applies to all levels Applies to all levels Applies to all levels
Comply with applicable laws and regulations Applies to all levels
Details of legal vetting process should be published
Applies to all levels
Details of legal vetting process should be published
Applies to all levels
Details of legal vetting process should be published
Ensure security of our services Checks according to level
Details of infosec vetting should be published
Checks according to level
Details of infosec vetting should be published
Checks according to level
Details of infosec vetting should be published
Prefer providers in our service region Applies to all levels Applies to all levels Applies to all levels

* Availability measured per quarter (99,999%: 1m 18s, 99,9%: 2h 11m 29s, 99%: 21h 54m 52s)

** The addition of 'or containers' here is the only small addition made to this table since we last shared this on RIPE Labs

Level of Strictness According to Criticality

The table above describes how we interpret our requirements on a scale from Strict to Standard. To determine where a specific service should sit within this framework, we need to determine its criticality. To do this, we start by defining services as falling into one of two categories:

  • Global Internet Services: required for the Internet to function properly (e.g. RPKI)
  • RIPE NCC-specific Services: critical for the RIPE NCC, but will not have a noticeable impact on the wider Internet if offline for a short period (e.g. LIR Portal)

Services within each of these categories can have differing levels of criticality (in terms of their importance either to the operation of the Internet or the RIPE NCC). We have identified three levels of criticality:

  • High: outages have a direct operational impact
  • Medium: outages have an operational impact within a few hours
  • Low: we can afford to be more forgiving regarding outages

The table below indicates how we identify the level of strictness that should apply to specific services, according to their criticality. Example services are included to make this more concrete. Note that we intend to work with the community to define the criticality of specific services and so these examples could change.

Table 2: Strictness According to Criticality
Criticality High/Very High Medium Low/Very Low
Global Internet Services Strict (RPKI) Heightened (RIPE Database) Standard (RIR statistics)
RIPE NCC-specific Services Heightened (Registry software) Standard (LIR Portal) Standard (Meeting registration software)
0

You may also like

View more

About the author

I am the Chief Operations Officer of the RIPE NCC, responsible for the registry, member-related services and software development, including the RIPE Database, LIR Portal, and RPKI. I have joined the RIPE NCC in 2012 as a Software Engineer, and since then have worked in different roles across the organisation. I have a MSc in Computer Science.

Comments 0