After the RIPE NCC implemented a system that checks for lame DNS servers in the part of the DNS tree the RIPE NCC maintains I was curious to find out how big a problem DNS lameness really is. I wrote a tool and presented it at the DNS WG during RIPE 59 in Lisbon.
There are two ways to make a DNS server lame:
1. The NS record does not resolve to an A or AAAA record. For example:
example.org NS ns1.example.com
2. Nothing at the IPv4 or IPv6 address returns an answer. For example:
ns.example.net A 0.0.1.16
The impact of bad or lame servers can be that the server does not answer or that it times out. In any case this causes a delay for the user.
The results of my lameness checker tool were as follows:
- 0.3% of all queries had a bad NS record
- 0.8% of all queries had an A or AAAA record resulting in a DNS lookup failure
That means that a bit more than 1% of queries were affected by lameness. I should point out that 5% of servers are lame, but only 1% of all queries were affected by it. The difference between these two numbers is what the tool was designed to measure.
I have to include a few caveats here:
- the results do not account for caching
- the code is based on IP, not on IP+domain
- the effects of bad name servers are very uncertain
- the effects of bad servers are slightly uncertain
After having measured the number of lame servers, I suggested during the DNS WG meeting to stop sending blanket e-mails to all LIRs and instead send targeted e-mails to those administrators who run servers that impact users the most. In addition to that one could consider sending a report to all LIRs annually.
As proposed during the meeting, I am here publishing the tool I wrote to make the measurements. There are two programs I used for lameness checking (see the programs attached at the bottom):
1. The first program - SQLite2pickle.py - takes the SQLite database that the RIPE NCC uses for their lameness checks and loads some information into a Python dictionary (which is what Python calls their associative-array, the same as a hash in Perl). It then saves this to a file for use by later processing. The two pieces of information are:
- The total number of IP addresses for each name server
- The number of good IP addresses for each name server
2. The second program - ScanPcap.py - does the real work. It takes a packet dump in pcap format and then looks at each packet to see if it is an answer, if the answer has good NS, and if the IP address for the NS work. It then spits out a count at the end.
The first program requires an SQLite database in the format the RIPE NCC uses - it requires the Python SQLite module. It should be more-or-less obvious what is being done, so other people can modify it to provide something that gets information from their own lameness scans.
The second program uses dnspython for parsing packets, so it needs that. It also uses a pcap parser - pcap.py - that I wrote which is incredibly slow but Good Enough for this kind of non-real-time checking. I used the psyco module to speed that up, but it is optional if people don't want to use that. You should be able to use the second program without modification.
Have fun with the tool. I am curious to hear if you used it for other parts of the DNS tree and to see the results. Please use the forum listed below, also if you have any questions or comments.