Daniel Karrenberg

Timeline of Reverse DNS Events

Daniel Karrenberg
dns
23

Last week there were several problems with the RIPE NCC's reverse DNS (rDNS) service. This article is a first report about the events. It is not intended to analyse the causes or make detailed recommendations for action.


Scope

Last week there were several problems with the RIPE NCC's reverse DNS (rDNS) service. This article is a first report about the events. It is not intended to analyse the causes or make detailed recommendations for action. We briefly describe the systems involved, the time-line of events and, finally, we list a number of immediate actions we have taken to improve our systems and procedures. Over the coming weeks, we will analyse the causes of these events and take the appropriate actions. We will conclude this process with a short report.

Introduction

rDNS is a part of the DNS that translates IP addresses to domain names, the inverse function of the regular DNS. rDNS can be used for diagnostic, logging and verification purposes. rDNS is not typically used in a way that makes it critical for browsing the web; rDNS is used frequently in the process of delivering electronic mail.

rDNS look-ups follow the IP address hierarchy. Therefore, the RIRs have a major role in providing the rDNS service. The RIPE NCC publishes rDNS information from two sources. The majority of the rDNS data comes from the IP address registry stored in the RIPE Database; this covers all address space allocated via the RIPE NCC since around the mid 1990s. The address space users can update this information in the RIPE Database and the rDNS provisioning system then translates it into rDNS zone files. A small amount of the rDNS data mostly pertaining to address space distributed before the mid-1990s comes from "zonelets" exchanged among the RIRs. The rDNS provisioning system combines all that information, passes it through DNSSEC signers and transfers it to the authoritative name servers.

Please see below a high-level overview of the system:

rDNS Provisioning Figure 1: High level overview of reverse DNS provisioning at the RIPE NCC

Below you can find a table showing the series of events. You can also view the pdf version of the table .

Time (UTC) Events Impact Assessment
Wed, 13 June
13:30 We discover that several zone files are missing from the DNS provisioning system [The cause of this is still unknown and under investigation. Circumstantial is a routine bind update in the morning.] No impact on DNS reverse operations, but zone updates broken for delegations in parent zones: 0.4.1.0.0.2.ip6.arpa, 185.in-addr.arpa, 4.1.1.0.0.2.ip6.arpa, 5.1.0.0.2.ip6.arpa, 6.1.1.0.0.2.ip6.arpa, 7.0.1.0.0.2.ip6.arpa, 7.1.1.0.0.2.ip6.arpa, 7.4.1.0.0.2.ip6.arpa, 8.0.1.0.0.2.ip6.arpa, a.0.1.0.0.2.ip6.arpa, a.1.1.0.0.2.ip6.arpa, a.4.1.0.0.2.ip6.arpa, b.0.1.0.0.2.ip6.arpa, b.1.1.0.0.2.ip6.arpa, b.4.1.0.0.2.ip6.arpa
(a total of 425 delegations, 185/8 is in de-bogonising: no operational impact)
13:45 Decision to reload zone files from backup storage
14:00 Discovery that backups are not available
14:15 Decision to cold start the provisioning system. Because the state of the remaining zone files available was unclear, we decide to rebuild all zone files from scratch.
15:00 Start DNS provisioning system from scratch (empty zone files).
By mistake we do not disable transfers to the authoritative servers.
Empty zones for entire reverse tree start propagating
Impact on whole of reverse DNS tree, limited initially by caching
16:00 Reports of reverse tree breakage start to come in
16:00 - 20:00 Investigation of problems and considering possible workarounds for slow provisioning system cold start
20:00 Found incidental backup of zone files with state of 13/6/2012 13:30 UTC
20:15 Stopped DNS provisioning system. Reloaded DNS provisioning system with data from backup files. ERX related zones are missing from these backups, as are the above mentioned ip6.arpa delegations. missing: 0.4.1.0.0.2.ip6.arpa, 185.in-addr.arpa, 4.1.1.0.0.2.ip6.arpa, 5.1.0.0.2.ip6.arpa, 6.1.1.0.0.2.ip6.arpa, 7.0.1.0.0.2.ip6.arpa, 7.1.1.0.0.2.ip6.arpa, 7.4.1.0.0.2.ip6.arpa, 8.0.1.0.0.2.ip6.arpa, a.0.1.0.0.2.ip6.arpa, a.1.1.0.0.2.ip6.arpa, a.4.1.0.0.2.ip6.arpa, b.0.1.0.0.2.ip6.arpa, b.1.1.0.0.2.ip6.arpa, b.4.1.0.0.2.ip6.arpa
(together containing a total of 425 delegations)
Authoritative servers reloading. However, a race condition in the provisioning system causes the zone serial numbers for two zones to be incorrectly updated. Therefore two large zones (212.in-addr.arpa and 213.in-addr.arpa) are propagating in an incomplete form. This causes severe breakage for these zones. In total approx. 6% of the reverse delegations are affected during this period Restored zone files start propagating for all but the below mentioned parent zones (state of 13/6/2012 13:30UTC). Due to negative caching, impact on restored zones may have been prolonged. Details of impacted zones below
Affected 6.1% of total reverse DNS delegations Parent zones 212.in-addr.arpa, and 213.in-addr.arpa are distributed incompletely. Affected: 43% in delegations in 212.in-addr.arpa, 54% of delegations in 213.in-addr.arpa. In total 33,996 delegations affected in these parent zones. ERX zones. ERX import zones: 4,426 delegations accross 22 zones absent during this period. ERX exports: updates delayed Above mentioned missing zones in ip6.arpa (475 delegations total) are still lacking during this period. RFC 2317 delegations: a total of 31 RFC 2317 delegations are lacking the associated CNAME records at this time.
20:30 Restarted DNS provisioning system, starting with state of 13.30 UTC. The DNS provisioning system is still running at an unexpectedly low insertion rate. At this time we believed the remaining impact to be limited to a small number of ERX imported zones, and a limited number of ip6.arpa zones. The problems with 212.- and 213.in-addr.arpa went unnoticed until early morning of Thursday 14 June
Thur, 14 June
7:00 First reports received about remaining breakage
7:00 - 10:00 Investigations of reported remaining issues
10:45 We discover the that 212./213.in-addr.arpa are incomplete due to the above mentioned race condition. After updating serial numbers, zones 212./213.in-addr.arpa start propagating properly again. Zones 212.in-addr.arpa and 213.in-addr.arpa, complete and up to date to the then-current state, start propagating again. ERX import zones (4426 delegations), ip6.arpa (475 delegations) and RFC 2317 zones (31 delegations) still not restored
16:00 Based on RIPE DB dump of 14/6/2012 0.00h, all regular zones are restored, incl. ip6.arpa zones
16:00- 16:30 Processing of updates for period after 00:00 14/6/2012
16:30 All updates processed for all zones, with the exception of ERX zones All regular zones restored and current. ERX import zones (4426 delegations) and RFC 2317 zones (31 delegations) still not restored
19:30 Restart processing of ERX delegations (much slower than anticipated)
20:00 Majority of ERX zones handled ERX import zones (2 delegations) and rfc2317 zones
Fri, 15 June
7:30 All regular zones restored including last remaining 2 ERX zones All zones, incl. ERX imports, fully functional, with the exception of 31 RFC 2317 delegations that were not discovered to be missing their CNAME records.
Mon, 18 June
11:00 Discovered error with 31 delegations lacking RFC 2317 CNAME records
13:45 Restored remaining RFC 2317 delegations

 

Immediate Actions Taken

We acknowledge that our operational performance was not up to the standards the RIPE community expects from the RIPE NCC in this instance and apologise for the considerable inconvenience caused. We have taken the following immediate steps and we will take further actions once the events have been fully analysed:

  • We have started ad-hoc backups of the bind zone files
  • We have re-emphasised the 4-eyes principle in case of operational irregularities
  • We have clarified the service announcements procedures
Tags:
dns
23

You may also like

View more

About the author

Daniel Karrenberg Based in Western Europe, NL&DE mostly

>>>>>>>>>>>> https://www.ripe.net/about-us/press-centre/publications/speakers/daniel-karrenberg <<<<<<<<<<<< Ample information about his past sins can be found using your favourite search engine. Following are a few additional keywords you might use, arranged by decade: 1980s: GUUG EUUG EUnet unido mcvax cwi RARE iepg RIPE; 1990s: RIPE+NCC rir iana postel terena ebone centr k.root-servers.net; 2000s: dnsmon nsd ris internet+society rssac; 2010s: ripe+labs ripestat ripe+atlas

Comments 23