You are here: Home > Publications > RIPE Labs > Wolfgang Nagele > Analysis of Increased Query Load on Root Name Servers

Analysis of Increased Query Load on Root Name Servers

Wolfgang Nagele — 11 Jul 2011
Contributors: Emile Aben, Daniel Karrenberg
After we observed increased query load on the root name severs recently, we did some investigation and analysis which is described in the article below.

Earlier, we reported about an Increased Query Load on Root Name Servers . At no time was there any operational problem for K-root, the root name server the RIPE NCC is responsible for. In the meantime, traffic load is back to normal again and we would like to present some preliminary results:

We immediately noticed this increase and we kept a close watch on the developments. Because the increased load did not pose problems for K-root, we decided not to block the traffic. However, we took some measurements to spread the load across other instances of K-root. Also, the collaboration with our peers was excellent during the entire time.

Initially a few small drops could be observed in our DNS monitoring tool  DNSMON . DNSMON is extremely sensitive and shows the slightest changes immediately. We took immediate action to limit the bandwidth impact of the increased load and the drops never appeared again.The engineering reason for these drops is an internal bandwidth limitation in our global nodes. This was known and upgrades were already scheduled. We have re-scheduled these upgrades to happen earlier where possible. 

By working with our peers and their up-streams we established early on that the majority of the traffic was originated from a few Autonomous Systems (ASes) located mostly in China. We contacted the operators of these networks to find out what's going on. This was relatively complicated because of the geographic distance and because we don't have a lot of contact with the operators in question. We will be looking into streamlining communications with distant ISPs. However, this will always remain difficult because of language and time zone differences.

One of the ASes originating the traffic is very big compared to the others, so initially it looked like it only came from one single AS. Consequently, anycast did not spread the query load as effectively as usual: because the bulk of the queries came from one AS and because this AS was very large, most queries were targeted to one particular anycast instance of K-root. This is the first occurrence of this kind of traffic distribution, or lack thereof. We will work with the routing community and seek methods to make anycast more effective at a more granular level than just by AS. Ideas are welcome.

Detailed analysis 

The increased load was caused by queries to a single .com domain, called <domain> in the text below. the www. <domain> .com hostname in this domain was by far the most popular. The webserver for www. <domain> .com was associated with a game site in China, which is not in the Alexa 1M list. A zone transfer of this zone from all 4 authoritative nameservers revealed a typical "one host small zone", and there are no indications this zone was using fast-flux or similar detection evasion techniques.

If we look at queries for *. <domain> .com. on 29 June, we find 99.996% of queries of type A (or 1 in decimal), and interestingly there are no queries of type AAAA. The very small fraction of other query-types is of both assigned and unassigned query-types, the top 3 are listed below. If we look at query class, 99.998% of queries is of class 'IN', the top 3 of other query-types is also listed below. Interestingly the value of the top query type and query class is 2561, for which we don't have an explanation (It is 2561 years ago that Confucius was born though). Queries for www. <domain> .com had the Recursion Desired (RD) flag set. Communication with the operators of .com revealed that they did not receive the same type of query-storm for www. <domain> .com. to their authoritative servers.

query type Count
TYPE2561 4437
TYPE3841 2566
TYPE1281 1198
query class Count
CLASS2561 2260
CLASS513 980
CLASS3585 20

Spot checking of a couple of IP addresses that queried for www. <domain> .com revealed that these IP addresses didn't query for anything else but <domain> .com, which suggests that there are no typical DNS resolvers behind these IP addresses. Speculating: this could be caused by either a misconfiguration in a piece of software, or something initiated by a botnet.

If we look at the ASes for sources querying the www. <domain> .com, we find sources from 287 distinct ASes, that map to 48 distinct countries. As already indicated earlier, the main source were ASes in China. If we look at the top 10 ASes, in terms of the number of /24s that traffic came from 8 where in China, the other 2 were in Taiwan and India (based on the delegation files from the RIRs). This indicates that this was not something exclusively originating from China, but there seems to be some correlation to countries with Chinese language populations. Speculating on what could cause this correlation, it could be due to either a bug in a piece of software that is tied to Chinese languages, or a botnet who's mode of infection is Chinese language specific, like language specific malware or spam.

If we look at the total query rate for www. <domain> .com to K-root (see Figure 1), we see 2 distinct changes in query rate, one on 29 June around 6am UTC, and the other one on 30 June around noon UTC.

overall packet rate for www.<domain>.com

Figure 1: Query rate distribution for www.<domain>.com on 29 and 30 June

If we look at the query rate per source address, we see that this is quite low. Figure 2 shows boxplots of the distribution of queries per minute per source for 30 minutes intervals.

packets per minute to k-root for specific domain Figure 2: Query rate distribution for queries to www.<domain>.com on 29 and 30 June

As you can see, the query rate per source address is at under 100 packets per minute typically.  The same 2 distinct changes (on 29 June 6:00 UTC, and on 30 June 12:00 UTC) as in the overall query rate graph (Figure 1) are visible here.

In Figure 3, the number of unique IP address per 30 minute interval querying for www. <domain> .com is plotted for 29 and 30 June. As can be seen from this plot, about 60,000 to 65,000 distinct IP addresses were sending queries to K-root on 29 June and the first part of June 30. About midway through June 30 the number of distinct IP addresses dropped significantly, at the same time as the total query rate decreased (see Figure 1) and the median query-rate per IP address increased (see Figure 2).

ip-count 29-30

Figure 3: Unique IP addresses querying for www.<domain>.com on 29 and 30 June

Conclusions

Even though we did not experience any disruption in service at any time, we decided to increase the over-provisioning of all global nodes of K-root in order to have even more headroom in the future. We will prepare methods to make anycast more effective at a more granular level than just by AS, in case such an incident repeats. We will work to improve communication with ISPs that are distant from our global nodes and our operations center. 

As to the cause of this incident we are still in the dark. While it is plausible that this was caused by a botnet, this hardly can be classified as a DDoS attack to K-root, since the packet rates per individual source were too low to have significant impact. One speculation is that this was a test of the capabilities of a specific botnet.

If you have more information or have an idea about the cause of this incident, please let us know in a comment below.

6 Comments

Anonymous says:
14 Jul, 2011 03:52 PM
Some additional analysis, which I put on the ncc-services@ripe.net and dns-wg@ripe.net mailingslists:

This issue appears not to be related to a
misconfigured zone, the zone looked (and still looks) like this:

<domain>.com. 7200 IN SOA ns1.<nsdomain>.
root.ns1.<domain>.com. 20091027 28800 600 604800 86400
<domain>.com. 300 IN A <ipv4_1>
<domain>.com. 300 IN A <ipv4_2>
<domain>.com. 7200 IN NS ns1.<nsdomain>.
<domain>.com. 7200 IN NS ns2.<nsdomain>.
<domain>.com. 7200 IN NS ns3.<nsdomain>.
<domain>.com. 7200 IN NS ns4.<nsdomain>.
www.<domain>.com. 300 IN A <ipv4_1>
www.<domain>.com. 300 IN A <ipv4_2>
<domain>.com. 7200 IN SOA ns1.<nsdomain>.
root.ns1.<domain>.com. 20091027 28800 600 604800 86400

The <nsdomain> was a different domain, not in COM.
We asked folks that operate COM and they didn't see the same query-storm for this domain though. If these were all 'normal' resolvers dealing with a misconfigured zone, I'd expect them to follow the delegation chain. Also when spot-checking some 20 source IPs for these queries we didn't find these did any other queries to K-root then for things in <domain>.com.

As mentioned in the article, we have several indications that this was caused by a botnet.

It is unlikely this was a reflector attack with spoofed source addresses, as there are some 60,000 unique source IPs per hour in the queries for this specific domain. For targeted spoofing I'd would expect this number to be very low, for random spoofing I'd expect this number would be far higher.

When looking at the query load for www.<domain>.com on 20110628, and before 16:28 UTC (0:28 Chinese Standard time) we have 2 queries for this domain, then it all starts:
#queries timestamp
1 1309252434
1 1309274472
8603 1309278521
9630 1309278522
11277 1309278523
14123 1309278524
12271 1309278525
12457 1309278526
12118 1309278527
12369 1309278528
12234 1309278529
12402 1309278530
12202 1309278531
12469 1309278532
12138 1309278533
12149 1309278534
... (continues to be in 10-12kps range for a while)

So either the misconfiguration started at around 16:28 UTC, or this wasn't a misconfiguration. The third possibility, already misconfigurated+slashdotted-equivalent, I think is not impossible but unlikely,
both because of it being past midnight at the ASes that were a major source of queries, and the very sudden increase in load.
Mesh says:
05 Dec, 2011 10:27 PM
China is testing to see how hard it'd be to take out the root name servers?
chaz says:
10 Feb, 2012 05:56 AM
Unlikley, but were they trying to hide something else? Since a lot Chinese software is pirated looks like someone or country embedded some extra code in that software and wated to try out their new bot net to see what it could. With their limited IP's the attack was located to a one root location. I would have shut them down.
Exaybachay says:
14 Mar, 2012 12:30 PM
It sound like beginning of the end =)
Alan says:
26 Feb, 2014 05:56 AM
The owner of <domain>.com was a victim of a DDOS attack against their authoritative name servers. They attempted (and probably succeeded) at mitigating the attack by adding glue for the www record directly into the root.
Dan says:
14 Jan, 2015 08:22 PM
I could not even sign in to my router initially. The IP address was changed? I had to do a full system factory re-set and firmware update to regain control over the DNS. Once done, I changed passwords. A notice sent from Charter here in Michigan alerted me, or I would not have known. I will spread the word on Facebook if it will help awareness of this Botnet attack, but it looks like a very extreme test of servers and also effects users as far as accessibility of routers. Perhaps users may help identify these activities if they are aware. Most people I know would not even open an email as I did and check their IP address.
Add comment

You can add a comment by filling out the form below. Comments are moderated so they won't appear immediately. If you have a RIPE NCC Access account, we would like you to log in.