RIPE Atlas Anchor to RIPE NCC Service Node
The RIPE Atlas anchors pilot is well on its way with currently 13 anchor nodes deployed at various locations. We have recently invited the next group of RIPE Atlas anchor hosts to actively join the project.
While the first RIPE Atlas anchor nodes have been deployed, we have also been investigating further possibilities to use the RIPE Atlas anchors, through the use of virtualisation technology. The bulk of this work was done by Luigi Corsello as part of his (bachelor) graduation project as an intern at the RIPE NCC. This short article describes the main results of his work. While the full report of the project is mostly useful for internal use at the RIPE NCC, it is available for those interested (see below).
During the project, three main aspects were taken into consideration:
- Time keeping at an accuracy suitable for RIPE Atlas anchors *
- Resource separation under high load conditions
- Manageability aspects in the existing operational context
Above characteristics were compared for three candidate virtualisation technologies: VMware ESXi, OpenVZ and KVM. These virtualisation technologies were investigated using a combination of existing open source benchmarking tools and home-grown (network) load generation, measurement and monitoring tools.
The results of this project showed that:
- Separation of disk I/O, CPU load and memory-utilisation between guest systems on all virtualisation platforms was good and sufficient for the foreseen use cases.
- Separation of network resources was confirmed not to be available ‘natively’ on OpenVZ and KVM, due to the nature of the underlying technologies. Traffic shaping techniques were not investigated due to time constraints for the internship project.
- VMware showed effective separation of network I/O capacity under most testing circumstances (see below a note regarding one exception**)
- Test results on time and clock behaviour confirmed an expected impact of prolonged high network loads on NTP time keeping. Other tested system parameters (disk I/O and CPU load) showed little impact on clock accuracy of the guest systems on both VMware and OpenVZ. Time keeping on VMware and OpenVZ was observed to be generally stable within a 1-2ms bandwidth.
The RIPE NCC evaluated the results of the internship project and has come to the following conclusions:
- Based on results of the resource separation tests, OpenVZ and VMware are both considered good candidates for implementation in the next phase of the pilot.
- Based on manageability aspects and ease of integration in the current configuration and management environment, there is a preference to use OpenVZ.
- Based on the limited capabilities of the VMware host environment and the lack of options to secure access to its host system, VMware is less preferred.
- The test results were inconclusive with regard to time stability on a sub-second timescale. Further work is needed in this area which will be undertaken by the RIPE NCC in the coming months.
Based on the above conclusions, we will continue the RIPE Atlas anchor pilot based on the OpenVZ technology. Previously, we have announced to start deployment of virtualisation on the Atlas anchors during this phase of the pilot. However, since we are still performing some lab tests regarding time stability, the roll out of virtualised anchors will be delayed by a month.
The current planning for the second phase of the pilot includes:
- Roll out of the second group of anchors starting March 2013
- Further in-house testing of time stability on virtualised systems, as well as on 'metal'
- Roll out of virtualised anchors on RIPE Atlas anchor systems in April 2013
The full report of the RIPE Atlas anchors virtualisation internship project can be found here.
See RIPE Atlas Anchors Pilot - Update on Phase One for a detailed roadmap of the RIPE Atlas anchor pilot.
* Note that the goal of the RIPE Atlas Anchors systems is not to replace the fine-grained measurements possible with the RIPE NCC TTM services. Accuracy in the sub-millisecond range was therefore considered out of scope for this work.
**) High memory load and paging rates showed an unexplained impact on network traffic on VMware. Due to time constraints of the internship project, no further investigations have been done to find a root cause.