More and more governments, authorities and courts are requesting censorship of Internet content. It is often done via a lying DNS resolver. Can we use RIPE Atlas probes to see it, and how?
Censorship (sometimes under other names) is now pervasive on the Internet. Most countries have one way or the other to censor some of the content on the Web (other Internet services are typically not yet on the radar of the authorities). It can be ordered by courts, by "independent" authorities or by the government but, technically, it raises the same issues.
There have been many discussions about the legitimacy of this kind of censorship, or about its technical consequences (such as an increased brittleness of the Internet). But this article focuses only on the measurements: can we see censorship and can we have an idea of its actual deployment? For various reasons, most of the examples here are in Europe.
The author hesitated for a long time before publishing this article, because there are strong ethical issues. Documenting the effects of censorship can be seen as helping censors. For instance, if measurements show that censorship is very limited in practice, it may motivate some authorities to increase the pressure and its negative consequences. But I believe that censors are already better informed than the average citizen and that it is necessary to have factual information in order to have an informed debate in democracies.
Another big ethics issue concerns
the measurements themselves. Is there a risk of
endangering people who host a probe by doing DNS lookups for
illegal/forbidden/questionable things (for instance DNS lookup for a porn site from a probe in
Iran)? Today, the DNS is typically "under the radar" for most surveillance activities. Doing an HTTP request for an illegal site attracts attention to you in some countries (and it is one of the reasons why RIPE Atlas probes do not perform HTTP queries for arbitrary URLs), but it does not seem to be the case (yet) for DNS requests. (See
DNS Privacy Considerations"
The DNS is a rendezvous system: it brings together a client who wants to connect to a service with a server or a peer that will provide the expected service. As such, it is a critical component: no DNS, no access to the service. That's why it is a tempting target for censorship.
The data in the DNS is provided by authoritative servers. For instance, to visit the website of Wikipedia, three name servers provide the data for the domain wikipedia.org. Authoritative name servers can be targeted with censorship (a famous case is RojaDirecta ). But it works only if the registry (or in some cases the registrar) is under the control of local authorities (the example above "worked" because .com is under the US law). If the registry is located abroad, this form of censorship does not work.
So, most of the time, DNS-based censorship targets the DNS resolvers. These are the servers, traditionally managed by IAPs (Internet Access Providers) or local networks, that answer questions from end-users. They have no data themselves but they query the authoritative name servers, starting from the root, and they cache the result for better performance. Typically users believe the results they receive are from the resolver, or rather, what they think was sent by the resolver (in-network DNS mangling of censored replies is common in China).
But a resolver can lie: Some do it for business reasons (a typical case is the IAP redirecting non-existing domains to the IP address of an advertisement server). Some do it because they are compelled to do so by a court order or a request from the government or a similar authority. In that case, when queried for the IP address of www.something.example, the resolver will not return the original IP address of www.something.example, but an IP address of its choosing. This feature (DNS lying) is already implemented in some programs, for instance the BIND resolver, where it is called RPZ . For other programs, it can be added more or less easily if you have access to the source code and can program a bit.
Note that encryption of the DNS channel (by DNScrypt or the IETF's DPRIVE working group 's DNS over TLS) won't help since it only protects the channel: having a safe channel with a lying resolver will not stop the lies. DNSSEC, on the other hand, will allow a validating resolver downstream of the lying resolver to detect the tampering. But the censored domains (see the examples later) are almost never signed (probably due to ignorance).
Technical people know quite well that this censorship technique is far from perfect (see the excellent report by the AFNIC scientific council, mentioned at the end). For instance, people can switch to another DNS resolver if it accepts their requests. This is the case with big public resolvers such as Verisign Public DNS or Yandex DNS . Do note that these resolvers can lie, too (some even advertise it as a feature, "for security" or "to protect children"). They also raise serious privacy issues. Even if you completely trust the public resolver, remember that they are not perfect when it comes to security since the (long) link between the user and the nearest instance of the resolver is not protected yet (it is an ordinary UDP exchange).
Another way to work around this censorship technique is to install your own local resolver on your machine and/or your local network. Software like Unbound can easily run even on small machines, such as a Raspberry Pi. This is certainly the wisest solution against lying resolvers.
Seeing it with RIPE Atlas probes
The DNS was originally conceived to be global: a name has the same meaning and returns the same result everywhere. ( RFC 2826 "IAB Technical Comment on the Unique DNS Root" talks about that.) Some "deviations" later appeared, mostly for technical reasons (serving different IP addresses, depending on the suspected localisation of the DNS client, to redirect a reader to the closest web server, for instance).
A long time ago, in the unlikely event that a remote DNS resolver apparently gave different results from yours, you could simply query it to see what was going on. Such "open resolvers" are now regarded as bad practice (see RFC 5358 ). Most DNS resolvers are now closed to remote clients. (There are still many open resolvers, enough to feed the excellent DNS debugging tool dnsyo .) It makes debugging DNS content more difficult: " looking glasses " (websites that allow you to query remote DNS servers) are less common than BGP looking glasses.
This is where RIPE Atlas probes are useful: they can do DNS requests with many possible options, allowing you to perform remote DNS content analysis. One can select the probes that will perform a measurement on various criteria but one is specially important for censorship issues: you can select them by country. So, questions like "what is the IP address of www.wikileaks.org as seen from Republic of Elbonia?" can have an answer.
RIPE Atlas probes can direct DNS requests to a specific resolver or to the default resolver. This last option (which was used in almost all the measurements shown later) means that the probe will use whatever was indicated to it, either with a DHCP response, or with a Router Advertisement with the " Recursive DNS Server Option". So, the default resolver can be one provided by the Internet Access Provider, or by the LAN, or by a local configuration.
RIPE Atlas probes, like open resolvers, are not evenly distributed. They are absent from some "interesting" countries, and when they are present they are probably in more "geeky" networks because they are installed by volunteers. These networks can use other DNS resolvers than the one provided by default by the IAP, for instance a public resolver or a local resolver installed in the local network. They can even be on a LAN whose traffic is completely tunnelled to a remote place in another country. So, when results show that censorship is far from 100%, one should keep in mind that RIPE Atlas probes are probably "privileged" when you compare it to the typical local user.
RIPE Atlas has an API to submit measurement requests. We use this API here through a Python program named resolve-name.py . To install it, just download the Python file and its RIPEAtlas library and check that you have the prerequisite dnspython . Then, create an API key in the RIPE Atlas web interface and copy and paste it in ~/.atlas/auth.
This program is used this way:
% python resolve-name.py www.wikileaks.org
Measurement #3049180 for www.wikileaks.org/A uses 500 probes
[ERROR: SERVFAIL] : 1 occurrences
[126.96.36.199 188.8.131.52 184.108.40.206 220.127.116.11 18.104.22.168 22.214.171.124] : 483 occurrences
Test done at 2015-11-28T15:54:19Z
Here, we make a DNS request of the name www.wikileaks.org with the default parameters: query type A (which is an IPv4 address), 500 probes requested, without any special requests on their location.
The results are all the same, a set of six IP addresses. Note that not all probes reported in time. (The SERVFAIL - Server Failure - seems spurious: some probes have broken resolvers, some have resolvers with a poor connection, leading to timeouts and SERVFAILs. The options --severalperprobe and --displayprobes may help to investigate these problems.)
Option -h displays all the possible options. Here, the most often used one will be --country (alias -c) to indicate we want probes from a given country. See below an example where we use only probes in Turkey (ISO 3166 code TR):
% python resolve-name.py -r 500 -c TR www.etha.com.tr Measurement #2905528 for www.etha.com.tr/A uses 32 probes [126.96.36.199] : 5 occurrences [188.8.131.52] : 6 occurrences [184.108.40.206] : 20 occurrences Test done at 2015-11-03T08:47:09Z
The name etha.com.tr points to an organisation monitoring the recent elections. 220.127.116.11 is the IAP Turk Telecom, 18.104.22.168 is TellCom, 22.214.171.124 is the real address, as shown by another test, with probes in Germany, code DE:
% python resolve-name.py -r 500 -c DE www.etha.com.tr Measurement #2905529 for www.etha.com.tr/A uses 498 probes [ERROR: REFUSED] : 3 occurrences [ERROR: SERVFAIL] : 3 occurrences [126.96.36.199] : 463 occurrences Test done at 2015-11-03T08:50:45Z
As always on the Internet, things change with time. It is therefore extremely important when reading resolve-name's results to notice the date of the test at the end.
Is censorship efficient?
In practice, is this form of censorship efficient? Let us examine actual cases with RIPE Atlas probes to see the results. It is well known that China exercises a lot of censorship. They use several techniques for that and one of them is not a lying DNS resolver but DNS interception and rewriting in the network (see the paper The great DNS wall of China ). Let's use our program to see it by querying 30 probes in China:
% python resolve-name.py --country=CN --requested=30 www.facebook.com
Measurement #3048986 for www.facebook.com/A uses 8 probes
[188.8.131.52] : 1 occurrences
[184.108.40.206] : 1 occurrences
[220.127.116.11] : 5 occurrences
Test done at 2015-11-28T13:44:17Z
None of these IP addresses belong to Facebook, and none will be seen in a real answer. The case in China is well documented .
But now, let's try a European country, with a music sharing site censored by lying DNS resolvers:
% python resolve-name.py --requested=100 --country=DK allofmp3.com
Measurement #3048975 for allofmp3.com/A uses 100 probes
 : 2 occurrences
[18.104.22.168] : 2 occurrences
[22.214.171.124] : 12 occurrences
[126.96.36.199] : 1 occurrences
[188.8.131.52] : 1 occurrences
[184.108.40.206] : 80 occurrences
Test done at 2015-11-28T13:40:03Z
The only authentic IP address is 220.127.116.11. You can easily test that it is indeed censorship and not some form of geography-based load balancing by querying a neighbouring country:
% python resolve-name.py --requested=100 --country=DE allofmp3.com
Measurement #3048991 for allofmp3.com/A uses 100 probes
[18.104.22.168] : 94 occurrences
Test done at 2015-11-28T13:46:01Z
The other IP addresses we got in Denmark (such as 22.214.171.124) belong to Danish providers and presumably point to websites displaying a warning message for the user.
This first example of censorship by a lying DNS resolver show the important characteristics of this technique: various IP addresses are used as destination, and the censorship is far from 100%. Actually, the majority of RIPE Atlas probes in Denmark see the correct answer. There can be many reasons for that: For instance probes connected to an Internet Access Provider that did not implement the censorship, or probes in local networks that use an alternative DNS resolver.
The same phenomenon can be seen in another European country, where access to the famous Pirate Bay is blocked by court order :
% python resolve-name.py --country=IE --requested=100 thepiratebay.se
Measurement #3049034 for thepiratebay.se/A uses 100 probes
[126.96.36.199] : 2 occurrences
[188.8.131.52] : 26 occurrences
[184.108.40.206 220.127.116.11] : 71 occurrences
Test done at 2015-11-28T13:59:35Z
Again, censorship is far from perfect, with a majority of probes seeing the correct answer (the one with two IP addresses).
Sites that are often censored tend to use various escape techniques, so it is important to always compare with another country:
% python resolve-name.py --country=DE --requested=100 thepiratebay.se
Measurement #3049042 for thepiratebay.se/A uses 100 probes
[18.104.22.168] : 1 occurrences
[22.214.171.124 126.96.36.199] : 91 occurrences
Test done at 2015-11-28T14:01:32Z
Here, we see that the second answer with two IP addresses is indeed the right one.
The responses sent by lying DNS resolvers vary. It can be, as mentioned above, the IP address of a website displaying a message for the user. It can be a dummy IP address (this is probably the case where 188.8.131.52 is used as you can see in the China example above), it can be the IP address of someone you do not like who will then receive all the traffic of the censored website. (It has been used in China, as an attack .)
And the response can also be a clearly invalid address, such as 127.0.0.1. Let's try with a censored gaming site in Bulgaria:
% python resolve-name.py --country=BG --requested=100 www.bet365.com
Measurement #3049045 for www.bet365.com/A uses 81 probes
 : 1 occurrences
[184.108.40.206] : 1 occurrences
[220.127.116.11] : 1 occurrences
[18.104.22.168] : 1 occurrences
[22.214.171.124] : 1 occurrences
[126.96.36.199] : 3 occurrences
[ERROR: SERVFAIL] : 1 occurrences
[188.8.131.52] : 61 occurrences
[127.0.0.1] : 3 occurrences
Test done at 2015-11-28T14:02:40Z
Among several possibilities, we see that three probes receive a 127.0.0.1 (the loopback address).
The response from the lying DNS resolver can also be a DNS error code, such as NXDOMAIN (No Such Domain). Here, a gambling site censored in France:
% python resolve-name.py --country=FR --requested=100 romecasino.com
Measurement #3049070 for romecasino.com/A uses 100 probes
[184.108.40.206] : 64 occurrences
[ERROR: SERVFAIL] : 6 occurrences
[ERROR: NXDOMAIN] : 11 occurrences
[127.0.0.1] : 15 occurrences
Test done at 2015-11-28T14:14:27Z
We see here false 127.0.0.1 but also lying NXDOMAIN. In that case, the online gambling authority, ARJEL , asks for blocking but does not specify the technical details. Therefore, different providers do it differently. As an example of the variety of possible lying responses, let's examine the music sharing site T411 today in France:
% python resolve-name.py --country FR t411.io
Measurement #3049724 for t411.io/A uses 500 probes
[ERROR: SERVFAIL] : 41 occurrences
[220.127.116.11 18.104.22.168] : 187 occurrences
[ERROR: NXDOMAIN] : 43 occurrences
[127.0.0.1] : 197 occurrences
[22.214.171.124] : 2 occurrences
Test done at 2015-11-29T16:04:34Z
We see here NXDOMAIN, SERVFAIL, localhost and a specific false IP address (126.96.36.199, which is OpenDNS, a DNS provider that provides voluntary filtering).
Sometimes, the authority requests a redirection to a specific address. This is the case in France for the sites "promoting terrorism" (decree n° 2015-125 du 5 février 2015 ):
% python resolve-name.py --country=FR islamic-news.info Measurement #1895736 for islamic-news.info/A uses 498 probes  : 22 occurrences [188.8.131.52] : 346 occurrences [184.108.40.206] : 403 occurrences Test done at 2015-03-15T15:39:15Z
Here, the real IP address was 220.127.116.11. Half of the probes were redirected to an IP address managed by the Ministry of Interior, with an HTTP server serving a warning to users (this warning was accompanied by a picture of a red hand, if you want to see the current version, see here - it's a domain name pointing to the official address).
Note that the governement apparently did not specify anything for IPv6. At least one provider decided to redirect to the IPv6 loopback:
% python resolve-name.py -t AAAA -c FR islamic-news.info Measurement #1895755 for islamic-news.info/AAAA uses 498 probes  : 586 occurrences [::1] : 191 occurrences Test done at 2015-03-15T20:25:57Z
Most probes saw an empty list because this domain has no IPv6 address.
Here is another case of a domain blocked by an administrative decision (not a court order) for promoting terrorism, a more recent one:
% python resolve-name.py -r 100 -c FR shahamat-english.com
Measurement #3068044 for shahamat-english.com/A uses 100 probes
[18.104.22.168] : 46 occurrences
[22.214.171.124 126.96.36.199] : 39 occurrences
Test done at 2015-12-07T07:52:44Z
The real IP addresses are the two-member array of Cloudflare addresses. The single address is the one of the Ministry. Here 54% of the responding probes see the lie. Remember what I wrote earlier about the fact that RIPE Atlas probes are not evenly spread? It would be very risky to claim, for instance, that "only 54 % of the users are blocked".
We have seen that censorship is far from being 100% implemented in most European countries. This is because many users configured their network (and therefore the RIPE Atlas probe) to use another, more truthful, resolver. Since the RIPE Atlas probes can query not only the default resolver they know from DHCP or RA but also any IP address, can we see the difference between the default resolver and the rest of the Internet? Let's see a censored domain in France, from a cable IAP:
% python resolve-name.py -r 500 --as 21502 thepiratebay.se
Measurement #3067610 for thepiratebay.se/A uses 28 probes
[ERROR: NXDOMAIN] : 18 occurrences
[188.8.131.52 184.108.40.206] : 8 occurrences
Test done at 2015-12-06T16:33:35Z
We see that 25% of the probes see the correct answer. If you want to be sure that the IAP's resolver lies, query it directly (option --nameserver):
% python resolve-name.py -r 500 --as 21502 --nameserver 220.127.116.11 thepiratebay.se
Measurement #3067612 for thepiratebay.se/A uses 29 probes
[ERROR: NXDOMAIN] : 28 occurrences
Test done at 2015-12-06T16:36:47Z
We can see that the default resolver of the IAP indeed lies. (This test will not work if the default resolver uses a private
IP address because the RIPE Atlas probes will refuse to query it "target: Invalid target"
for security reasons
A common problem of blacklists is their management in the long term: it is typically much easier to get onto a blacklist than to get out. For instance, when a censorship ruling has a limited duration, some providers do not implement it and keep the censored domain forever. For instance, the following domain is no longer officially censored in France but three RIPE Atlas probes still see the redirection to the Ministry of Interior:
% python resolve-name.py --country=FR --requested=100 jihadmin.com
Measurement #3049114 for jihadmin.com/A uses 100 probes
[18.104.22.168 22.214.171.124] : 92 occurrences
[126.96.36.199] : 3 occurrences
[ERROR: NXDOMAIN] : 1 occurrences
Test done at 2015-11-28T14:54:35Z
Another consequence of this delay in updating blacklists is that sometimes the domain no longer exists except on lying DNS resolvers, which still provide the old information.
This is why it can be useful to select RIPE Atlas probes per AS and not by country if you suspect that something is operator-dependant and not country-dependant. Here is the handling of the music sharing site T411 by a specific AS:
% python resolve-name.py --requested 100 --as 21502 t411.io
Measurement #2209665 for t411.io/A uses 38 probes
[188.8.131.52 184.108.40.206] : 20 occurrences
[ERROR: NXDOMAIN] : 18 occurrences
Test done at 2015-07-30T08:23:49Z
And here, with another AS, where the lying response is localhost and not NXDOMAIN:
% python resolve-name.py --as=12322 --requested 100 t411.io
Measurement #3068597 for t411.io/A uses 100 probes
[220.127.116.11 18.104.22.168] : 23 occurrences
[127.0.0.1] : 71 occurrences
Test done at 2015-12-07T17:12:43Z
It is interesting to note that, given the prevalence of censorship, a technical glitch can often be seen as censorship. People on social networks are quick to see censorship as the reason for everything. It happened in the USA when a DNSSEC error by NASA was described as " Comcast blocks NASA " (see the real analysis ). And it happens often in France with the noblogs.org service. This service hosts political content, typically radical. They have a risky DNSSEC configuration (a combination of NSEC3 with DNS wildcards) that triggers a bug in the DNS resolvers of Free, the second largest IAP that hosts many RIPE Atlas probes:
% python resolve-name.py -r 500 -c FR -t A ladiscordia.noblogs.org
Measurement #2490878 for ladiscordia.noblogs.org/A uses 500 probes
[ERROR: REFUSED] : 3 occurrences
[ERROR: SERVFAIL] : 116 occurrences
[22.214.171.124 126.96.36.199] : 338 occurrences
Test done at 2015-10-08T08:58:30Z
Each time it happens, people claim that Free is censoring noblogs while evidence shows that it is a technical bug. (There is a detailed technical analysis in French.) To test these sorts of problems, the ability to select RIPE Atlas probes by AS and not by country is very useful: one can easily see that the problem appears only in Free's AS.
It should be noted also that there are examples of non-DNS censorship. For instance, Bangladesh censored Facebook but the DNS resolvers in this country appear to tell the truth:
% python resolve-name.py -r 500 -c BD facebook.com Measurement #3043580 for facebook.com/A uses 15 probes  : 1 occurrences [ERROR: SERVFAIL] : 2 occurrences [188.8.131.52] : 9 occurrences Test done at 2015-11-26T08:02:06Z
We see the correct IP address of Facebook. This is because the censorship is done at the IP level. There is no way to perform a HTTP request to Facebook with RIPE Atlas but there is a trick for HTTPS sites: RIPE Atlas allows one to retrieve the certificate of a TLS server. This way, with the cert-name.py program, we can check that Facebook is indeed blocked:
% python cert-name.py -r 500 -c BD www.facebook.com Measurement #3043683 to www.facebook.com uses 17 probes 17 probes reported [FAILED TO GET A CERT: connect: No route to host] : 1 occurrences [FAILED TO GET A CERT: connect: Network is unreachable] : 1 occurrences [FAILED TO GET A CERT: connect: timeout] : 15 occurrences Test done at 2015-11-26T08:09:07Z
To summarise, it is clear that today DNS censorship is far from "perfect". It is possible that it is sufficient for the censors: after all, 100% success is always hard to achieve and the fact that a few geeks manage to evade censorship through a local resolver may not be relevant for most censors. This may change in the future: for instance, operating systems may ship with a local resolver pre-configured, making this anti-censorship solution much more widely available. In that case, censors may escalate by blocking or hijacking (like the Chinese government does) port 53 (used by the DNS). Operating systems will counter-attack with encrypted tunnels over port 443, and so on and so forth. This is a bleak future ahead of us, for Internet simplicity and reliability.
Another way to evade DNS censorship is of course to migrate to a non-DNS system for naming and rendezvous. One example is Tor with its .onion addresses, but there are also Namecoin , GNUnet , etc. Today, except maybe Tor hidden services, the deployment of these systems is very limited.
The reference for Internet censorship analysis and measurement in one country is of course GreatFire . Another very good and comprehensive survey of one country is the study " Understanding Internet Censorship Policy: The Case of Greece ".
The best technical analysis of DNS censorship and filtering is the one made by the AFNIC scientific council .