Kim Davies, President of PTI and Vice President of IANA Services for ICANN describes recent and new challenges when managing the DNS trust anchor.
This article is an updated version of a presentation I recently gave to the DNS Working Group about DNSSEC Root Key Signing Key (KSK) ceremonies, the challenges we're facing to run those ceremonies and some thoughts on how to improve our processes in the future.
Why do we need KSK ceremonies? The DNS root zone contains information on top-level domain (TLD) name servers (.com, .edu, .nl, etc) that allow Internet's users to land on the correct website when typing a domain name, and DNSSEC adds verification mechanisms to this to give confidence the results are authentic. Underpinning this security is keeping the cryptographic keys for the root zone safe — ensuring they are accessible to allow the root zone to be signed, whilst ensuring they are not tampered with. The use of public KSK ceremonies allows us to fill both these goals.
Under normal circumstances, we organise KSK ceremonies every three months and fly in Trusted Community Representatives (TCRs) from around the world to one of two facilities in the US — one in El Segundo, California and the other in Culpeper, Virginia. TCRs are community members who regularly participate in the ceremonies, vouch for their correctness, and also play a role in the cryptographic devices that contain the private key that will be used to sign the root DNS zone’s public keying information for the following few months.
In 10 years of ceremony operations, we’ve been able to recover from every issue we’ve encountered on the day of the ceremony without any challenges. We’ve always held the following day as a ‘standby day’, but we never had to use it. However, 2020 has surprised us.
KSK Ceremony 40
The first ceremony of 2020 was scheduled for 12 February. On 11 February, some pre-ceremony maintenance work was being conducted to upgrade a lock assembly with a newer model. Unfortunately, the safe would not open. The device indicated that the combination was dialed correctly, but the bolt did not retract to allow safe access. There was either an electrical or mechanical failure of the lock.
The remedy exercised one of the worst-case disaster recovery scenarios that had been previously only contemplated: “drilling the safe”.
It took approximately 20 hours across two days to drill into the lock assembly, remove the bolt, to allow the safe to open. This was followed by safe remediation and the installation of a new lock. This procedure was complicated by anti-defeat mechanisms that were triggered in the lock due to novel materials in safe construction.
Ultimately, the ceremony was successful and we gained valuable experience on drilling a safe which might help us for future disaster recovery. Everything was back on track for future ceremonies, until COVID-19 happened.
KSK Ceremony 41
This most recent ceremony was scheduled for 23 April and marked our 10-year anniversary of holding key ceremonies. But due to the COVID-19 crisis, we were facing unprecedented challenges and were asking ourselves a lot of questions such as: can people still attend? What if people would get sick? Can we even continue to access our facility?
In the end, we managed to successfully perform the KSK Ceremony 41, but with minimal in-person participation to limit risks associated with the COVID-19 pandemic.
TCRs participated remotely and we bolstered our normal remote participation to make it a more active part of the ceremony. We revised the DNSSEC Practice Statement (DPS), and four TCRs transferred their credentials to IANA staff to act as surrogates. We also minimised the scope of the ceremony to the most essential items (deferring tasks relating to TCR replacement and Hardware Security Module induction). We also decided to sign key materials to cover nine months, rather than the normal three, to give us more time to deal with the pandemic before needing to hold another ceremony. As a result, the next ceremony should take place around February 2021.
2020 has definitely been a challenging year so far and has tested our ability to be adaptive and exercise worst case scenarios that were built into the design. Thankfully, with the support of the operational community, we’ve been able to make necessary changes while retaining full confidence in the security of the KSK materials.
The Future of KSK Ceremonies
We feel that the our current approach to KSK management is transparent and accountable thanks to independent auditors, TCRs and the community overseeing us and providing us with feedback on how to maintain trust in the system. We are also proud to provide information and best practices to the cryptographic community on how these key ceremonies are conducted.
However, constant renewal is part of our DNA and we’re always looking to re-evaluate how to do things better. Some of the issues raised in the recent ceremonies were already on our radar, but COVID-19 makes addressing them more important.
Key Management Facilities
At the moment both of our KSK facilities are located in the US and a task for us to explore with the community is whether this needs to change. Having facilities in different locations may add resiliency to make it easier to hold ceremonies, but it has the cost of increasing the risk of attack as there are more locations to defend. More facilities will also increase the cost of running the KSK as these facilities are expensive to run and we need to have staff nearby to manage them. Rotating through more facilities means each one lays at rest longer which increases the opportunity for surreptitious activity or decay in operational environment. It will be a complicated discussion to strike the right set of tradeoffs for cost, complexity, resiliency and security.
Global mobility and physical-based security
In a post-pandemic 21st century, is a model founded on distributing trust around the world using physical separation still appropriate? Now that we know that international travel can be impacted so severely, should we rely more on logical sharing of essential elements? Do fundamental aspects need a redesign? For the long-term health of the model, we need to think about the existing design to see if it is still the best way to manage the system.
Another area of consideration is whether to implement the notion of a “standby key”. If we generate and pre-populate an alternate trust anchor, then it can be put into action if needed via different mechanisms. Benefits of this approach include recovery from force majeure events, but it requires the added cost and complexity of storing the standby key elsewhere to avoid fate sharing with the production key. If we use alternate mechanisms/different facilities than the production key, how do we then secure it to a satisfactory level? And if it is stored in a scaled down facility, when we need to use it how would we perform ceremony operations?
Your help is needed
Ensuring trust in the system is fundamental to how we operate, and we think radical transparency to shine light on the process helps foster trust, as messy as the details may be. New participants are always welcome through participating in future ceremonies (either remotely, or in person when circumstances allow), volunteering to be a TCR, or engaging in discussions on how to evolve the system. New perspective hones our approach and brings our attention to new ideas that we may not have considered before.