Read how we're using AdTag, a new methodology to perform network measurements that leverage the nature of ad networks.
For many years, the research community, practitioners, and regulators have used myriad methods and tools to understand the complex structure and behavior of ISPs from the edge of the network. Unfortunately, the nature of these techniques forces the researcher to find a balance between ISP-coverage, user scale, and accuracy.
In this article, we present AdTag, a new methodology to perform network measurements that leverage the nature of ad networks.
Deploying network measurements
Figure 1: AdTag architecture, distribution channel and client-server components for measurements
We deployed AdTag measurements using real advertising campaigns configured through a Demand Side Platform (DSP). A DSP is an intermediary platform, of the online advertising ecosystem, providing advertisers unified access to multiple vendors (Ad Networks and Ad Exchanges), each selling ad spaces from a pool of websites and mobile apps. It also enables advertisers to configure targeting parameters for their campaigns (geographical location, device type, etc.). As a proof of concept, we ran a seven-day campaign using nine of the more than 20 ad networks provided by a DSP. This campaign provided us with more than three million measurements from 2.5 million unique IP addresses covering 185 different countries.
Targeting ISPs and locations
Targeting measurements to specific ISPs and geographical locations allows researchers to precisely analyse and penetrate particular providers. This ability is determined by the accuracy of the targeting mechanisms provided by the DSP. Most DSPs allow targeting campaigns based on location, device type (e.g. desktop vs. mobile), and even operating system. We used this feature to configure the campaigns to the experiment’s needs and to target specific ISPs.
We performed several experiments to analyse the feasibility of targeting ISPs and platforms and evaluate the precision of the DSP’s target mechanisms. We used MaxMind’s database to geolocate client IP addresses. While research has shown that the use of IP geolocation databases can introduce biases, we believe them still to be indicative of the overall deployment. Our global ad campaign covers 185 countries, with the majority of measurements coming from clients in the US (28%), UK (8.8%), Brazil (6.8%), and Canada (5.1%). Figure 2 shows the overall geographical coverage obtained with our global campaign, which covers 185 different countries.
Figure 2. Distribution of user IPs around the world
This observed coverage distribution is expected as we did not use precise geographical targeting and thus received biases in impressions towards the US, where most of the websites are hosted, and towards ISPs with a large customer base. To target a particular ISP, the researcher can adapt the campaign using various features offered by DSPs. Some DSPs allow deployment on a country or city-level, this can be used to target the area where a desired ISP is known to operate, maximising the number of valid samples. To validate these proposed solutions, we ran two one-day 50K sample experiments targeting the USA and NYC, respectively. In the country-level experiment, 97% of the users had a US-based IP address. The rest of the samples came from a handful of countries, namely Canada (2% of the total samples). The results of the city-level experiment showed similar accuracy.
Running online advertising campaigns comes at a cost. However, it is possible to leverage different strategies to maximise the geographical coverage while keeping the budget under control. For instance, in our global campaign we fixed the CPM (Cost Per Mille) budget. Our DSP allows CPMs starting at $0.10. Therefore, it is possible to launch campaigns at this minimum CPM cost and consider higher CPMs in order to increase geographical and ISP coverage when needed (for instance, to target under-represented geographical areas).
For the majority of network measurements, user clicks are irrelevant. User interaction may be only needed when their feedback is required, as in the case of QoE experiments. As a result, AdTag does not need to apply any campaign optimisation based on CPC (Cost per Click), notably reducing the budget requirements to launch measurement campaigns.
An estimation of the cost per campaign, assuming an average CPM of $0.10 and a conservative efficiency ratio of 80%, resulted in approximately 1M measurements for a $125 budget.
We used the data provided by our global campaign to estimate the expected execution window. Our results suggest that 75% of ads are active for more than 11s, regardless of end-user platform, with a median time of 33s.
We saw significant differences in the execution window depending on the platform: 75% of ads rendered on the desktop were active for at least 15s whereas this decreases to just 8s for mobile devices. This analysis suggests that being time-conscious is critical to the experiment’s design. Tests should launch and complete quickly, and should be scheduled opportunistically to make use of long-running ad displays.
JS libraries can be used to bootstrap a wide range of network measurements through AdTag. Next, we presented a non-exhaustive list of interesting network measurements—some based on previous measurement tools using full-stack programming languages—that can be successfully ported to JS.
- Detecting middleboxes and traffic manipulation: A careful instrumentation of both the client- and the server-side of AdTag can reveal the presence of HTTP and HTTPS middleboxes and if they perform any traffic manipulation. Using the WebSocket and XHR libraries, we can force the client and the server to speak custom variants of HTTP over TCP, a technique proved valid to identify and characterise HTTP(s) proxies.
- NAT detection and characterisation: WebRTC allows performing STUN and TURN requests that can be used to study NATs at scale. In this case, a STUN/TURN server is required. Because of the direct access of the user to proper protocols over UDP for NAT traversal through STUN and TURN, the client can obtain data regarding its IP, probe for NAT existence, check for middlebox state and identify port allocation policies.
- CDN performance: CDN performance highly depends on the replica selection algorithm and DNS resolution. AdTag clients can fetch one (or more) small object(s) from a CDN provider hence providing detailed performance metrics such as the time-to-first byte (TTFB), and the location of the assigned replica.
- IP classification: AdTag-based tests can help to classify a given IP address along different dimensions: by network type (i.e., residential, enterprise or mobile) and characteristics (e.g., proxied or NATed). The mapping of an IP to UAs reveals the sharing condition of an IP address. This can complement existing IP intelligence datasets, helping to further contextualise the data provided by IP blacklists, WHOIS records, and geo-IP services.
This work was originally published in the Sixteenth ACM Workshop on Hot Topics in Networks.
You can watch the webcast of Patricia presenting this research at RIPE 76.