In this article, we describe how we detected a large number of fraudulent sites using tools developed by SIDN Labs.
Suppose you're looking for some new shoes or a designer bag. You find a webshop offering just what you want at a really good price. So you decide to grab yourself a bargain while you can. But your order never arrives, or, if it does, you receive a counterfeit, low-quality product instead of the designer brand you ordered. Sounds familiar? The chances are that the webshop was a scam, like the one below.
Figure 1: Counterfeit webshop on .nl zone (2016)
Websites like the one above offer the moon but deliver shoddy counterfeit goods or nothing at all. For several years, SIDN Labs has been hunting out counterfeit webshops. And now we've written an academic paper describing our detection systems and what we've learnt about the way the scammers work, which will appear in the forthcoming Passive and Active Measurement conference (PAM2020).
The counterfeit industry and online scams
Counterfeit goods are big business. The industry includes books, aircraft parts, electronics, pharmaceuticals and many other categories of products. In the EU alone, the 2016 border seizures of counterfeit products had an estimated value of EUR 670 million, while in the U.S. the 2017 seizures amounted to USD 1.2 billion.
Counterfeiters may advertise original products, often with very large discounts (60%+), but then deliver nothing, or cheap knockoff versions, leaving users scammed. These scams have been widely reported in the national news of the Netherlands (e.g., 1,2, 3). And this is not only observed in .nl: Germany's .de was found to have more than 16.000 counterfeit webshops, many active for several years.
First counterfeit webshop detector: BrandCounter
Back in 2017, together with our Registrations and Services colleagues, we started to notice a strange pattern on the registration of domain names: after expiration, some domain names were re-registered quickly and were used for an online webshop selling Nike shoes at high discounts, as shown in Figure 1 above.
What's even more suspicious is that these domain names seem to be random and bearing no relation to the products being sold. Domain names from former dental practices, deli shops, bakeries, were now all hosting different but yet similar online shops selling similar shoes at large discounts.
That is when we set out to determine how many of these shops were present in The Netherlands' .nl DNS zone (run by SIDN, a not-for-profit foundation). We wanted to have an idea of how big of a problem it was.
Initially, among the shops we came across, we noticed that they all shared similar characteristics: very long HTML titles that listed multiple luxurious brands names, such as "Nike, Reebok, Adidas, Tommy and Gucci shoes at a great discount!"
That provided us with an easy way to detect these shops. Given we crawl the .nl zone regularly using DMAP, we could easily analyse the HTML titles of all .nl webpages, and create a detector for that - which we named BrandCounter. We split the HTML titles and compared its individual words against a list of brands we manually compiled, containing more than 1,100 terms (in both English and Dutch).
The results were staggering: 12,000 domain names had more than five matching words on this list and were classified as suspicious. You can see this in the first column in Figure 2 below. Given that we, as the .nl registry cannot take down such domains directly due to our regulations, we contacted the most affected registrars and notified them of these suspicious domains. They then used their internal process to determine if users had violated their policies.
Figure 2: BrandCounter suspicious domain results for the .nl zone
Figure 2 above shows the effectiveness of these notifications and the decrease in the number of shops. Registrar A (Reg. A), which we reported in the first round, had been used by counterfeiters to register more than 6,000 suspicious shops (blue line in the first column). After we notified them, they contacted their registrants and ultimately decided to suspend the webshops - which can we seen by drops on 2018-01 onwards in the blue line.
The BrandCounter method is rather trivial. Yet, it demonstrates how little pressure the owners of these suspicious domains were experiencing: they didn't even have to try to hide their methods. We continued running BrandCounter on a monthly basis, as shown in Figure 2, and we saw a steep decline in the number of webshops, giving that we kept on notifying our registrars.
Later, however, we realised that the BrandCounter's efficiency had deteriorated, especially after August 2018. Had the counterfeiters learnt how to dodge our detector? Or was the .nl zone simply free of these webshops now?
New counterfeit webshop detector: FaDe
Given the BrandCounter efficiency reduced with time, we came up with a new counterfeit webshop detector that did not rely on the same methods employed by BrandCounter. This new tool we called FaDe (Fake Detector).
To develop FaDe, we teamed up with International Card Services (ICS), a major credit card issuer in The Netherlands, which had a list containing 231 webshops related to online scams. This was the ground truth we used as a basis to build FaDe.
FaDe employs a support vector machine (SVM) to detect counterfeit webshops. SVM is a supervised classification method, and we used the labelled data set provided by ICS to train it. Differently from BrandCounter, FaDe did not rely on HTML tiles. Instead, it used nine features related to the domain name registration itself and the infrastructure.
Figure 3: List of features employed by FaDe
We classified another 1,407 domains as suspicious by applying FaDe to the .nl zone, and our ICS colleagues validated the results manually, confirming that 894 were indeed malicious -- leading to a 73% true positive rate. We sent notifications to the registrars of these domains. Out of the 894 notified domains, 747 were taken down by registrars, who decided to suspend the domains.
An additional analysis showed that BrandCounter was unable to detect the counterfeit webshops we found by FaDe. The counterfeiters, somehow, stopped using brand names in their HTLM titles and thereby evading BrandCounter. However, it is much harder to change the registration pattern - which explains why FaDe works.
Who is behind these webshops?
These two controlled studies led to 4,455 webshops being taken down by registrars. We confirmed these ourselves by analysing authoritative DNS and HTML data. But some questions remained: who was behind the counterfeit shops? And how do the counterfeiters operate?
Let's start with the who question first. We do not know who is behind it, but our data suggests that these domains are registered in China. First, we looked into which email providers the registrants used to register the suspicious domains. Figure 4 below shows that 163.com, a Chinese email provider, is at the top, for both detectors. And they are not very popular in the Netherlands or the West in general - most people use EU/US cloud-based companies.Figure 4: Number of shops by registrant's e-mail domain
Secondly, we also see that registration hours of the domains we detected coincides with East China working hours, as seen below.
Figure 5: Number of shops by registration hour
How do counterfeiters operate?
All our results indicate counterfeiters heavily rely on automation. For example, 80 per cent of the suspicious domain names we identified were re-registrations - so counterfeiters can draw on their 'residual trust', i.e., the reputation that these domains previously had. We also found they chose registrars that provided an API for automatic registration of domain names, as well as very competitive prices.
Given the templates of the websites are similar, it suggests that the content creation for these websites are also automated. By using automation, counterfeiters can easily create thousands of webshops. In doing that, they overload any brand protection department, given they cannot cope with the sheer scale of domains, and their traditional long court cases.
In the end, it's simply a matter of economics: domain names are cheap, so is hosting, so it's better to automate and create hundreds, that way profits from a few sales can pay for the entire infrastructure.
Registries have an ideal vantage point
Counterfeit webshops are hard to be detected by banks, brand protection providers, government agencies and others. Those companies and organisations don't have an overview of the domain namespace. By contrast, registries - including SIDN - know all the domain names in their zones and have access to registration data, in which patterns are sometimes detectable.
That's exactly what the activities reported in our paper involved: the extraction of patterns from information about known counterfeit webshops, and the use of those patterns to identify thousands of more scams. In this way, the two controlled studies in partnership with registrars and ICS resulted in the removal of 4,455 counterfeit webshops.
A Safer .nl and the ongoing fight
Selling counterfeit goods is a lucrative activity and setting up counterfeit webshops is easy. That's clear both from our research and from information published by NOS and the Dutch Consumers' Association. We, therefore, don't expect the counterfeit webshop problem to be resolved anytime soon. So we'll go on using the unique position that we have as a registry to fight the scammers. We'll also continue re-evaluating the relevance of the patterns we use to identify problem sites, on the assumption that fraudsters will always be looking for new ways to avoid detection.
We will continue to document the problem in the .nl zone, but other ccTLDs have experienced a large presence of such websites as well (e.g., Germany' s .de). We also intend to raise awareness of the problem, so other TLDs and registrars can determine if they have also been victims of such shops.
Ultimately, our main goal is to protect .nl users. We hope that our work helps to prevent users from falling for these scams and having to deal with its associated financial losses. We also hope this encourages other TLDs to do the same.