Danilo Giordano

Five Years at the Edge: Recording the Evolution of Web Usage from an ISP

Danilo Giordano
Contributors: Danilo Giordano

7 min read

While 2018 has no doubt been another big year in the world of the Internet, it often pays to look at how it compares to previous years to understand and forecast growth or decline trends.


Since 2013, we at the SmartData@Polito lab, Politecnico di Torino, have been conducting a large-scale measurement study from the perspective of a Tier-1 ISP based in Europe. We do this to better characterise Internet usage trends, as well as the technologies and infrastructure changes that are being implemented by the industry for the benefit of forecasting potential changes and expenses.

In this post, we share some of the highlighted observations from the past five years, including the average 2.5x increase in daily traffic consumption per user, driven by the rise of popular social messaging and media applications as well as video streaming, the birth of sub-millisecond CDNs and the rise and fall of various content developing protocols.

Collecting and Analysing 250 Billion Traffic Records

From 2013 to 2017 we collected, processed and analysed 250 billion traffic records (more than 30 TB of compressed data) from a nation-wide ISP in Italy.

We used big data techniques to extract high-level information from the measurements, focusing explicitly on the characterisation of the Internet services people use. For this step, we needed to associate every single flow to a service. We relied on the server domain names, which we extracted from:

  1. The HTTP host field; or
  2. The SNI during TLS negotiation for HTTPS flows; or
  3. By using the DNS exchange observed before opening a TCP connection.

The association from the domain name to the corresponding service is solved by using a flexible set of regular expression rules. Figure 1 shows our measurement set-up.

Figure 1: Schematic of measurement infrastructure and processing steps

 

Daily Usage Increased 2.5 Times

We first characterised the amount of traffic consumed by subscribers per day, over the past five years. This analysis is useful for understanding and predicting expenses for ISPs in terms of capacity.

Figure 2: Average daily download per subscription

 

Figure 2 depicts the average traffic consumption for an ordinary broadband subscriber per day. From 2013 to 2017, traffic more than doubled, increasing from less than 300MB/day/user to more than 700MB/day/user. FTTH users also consumed more traffic, topping 1GB/day/user. Bandwidth-hungry video services are thought to be driving this change, while social messaging applications spike (and fall) as quick as their popularity.

 

Instagram, WhatsApp, Fabebook and Bing Rise, as SnapChat Falls

We next characterised the popularity of services over time. Figure 3 details how many users in a given day were actually accessing a service. Google’s search engine was the most consistently used application over the five years (>60% of users), while the use of Bing grew from less than 10% to greater than 45%. Note, this is actually inflated by Windows telemetry, which reports to bing.com.

Figure 3: Popularity of selected services over time

 

YouTube and WhatsApp usage also grew fairly significantly over this time — both were being used by more than 50% of users on a daily basis by the end of the survey.

We recorded constant growth in the popularity of newly released, content-rich social media applications, particularly SnapChat, WhatsApp, and Instagram, all of which corresponded with significant growth in data usage, as shown in Figure 4.

Interestingly, while they all almost mirrored each other in terms of growth, SnapChat started to fall away dramatically from mid-2016; Instagram users were, by mid-2017, using 150MB of data on average, the same amount consumed by YouTube users; and WhatsApp users exchange about 10MB daily despite the app being mainly designed for messaging. Finally, notice WhatsApp peaks during holidays when everybody exchanges wishes, doubling their daily traffic.

 

Figure 4: Average daily download per ADSL subscription for Snapchat, WhatsApp and Instagram

 

While for Facebook we recorded a constant popularity of 70% of users contacting Facebook. Interestingly Figure 5 shows how in March/April 2014, Facebook started enabling video auto-play for its applications, causing an immediate effect on ISP traffic. Indeed, the daily average traffic per subscriber towards Facebook has grown from around 35 MB to around 70 MB in a month, increasing up to 2.5 times more in July with respect to March 2014!

This figure illustrates once more how the big players controlling key client software and servers can deploy impactful changes in the Internet, complicating the planning and management of ISP networks.

 Figure 5: Facebook average daily per-user traffic before and after automatic video play

 

How Protocols Evolved Over Time

Next, we studied how protocols and service infrastructures evolved over time, highlighting unpredictable events that may hamper traffic management policies. Figure 6 shows the breakdown of web traffic by application protocol.

Figure 6: Web protocol breakdown over five years

 

Sudden changes and custom protocol deployments are highlighted by the letters, for example, the decline of HTTP (A) and introduction of QUIC (B).

In 2013, 90% of traffic was over HTTP with just 10% over HTTPS (A). Since then, we witnessed the introduction of new protocols by big players that control both the server infrastructure, and the client applications.

Google started deploying QUIC in October 2014 (B), experimenting with this new protocol years before bringing it to the IETF for standardisation. The same happened later with SPDY (C), which was abandoned shortly after in favour of HTTP2.

Facebook also developed and deployed its custom solution — Facebook-Zero (E). Being proprietary, nobody knew of its existence, despite more than 10% of web traffic being carried over it from 2017.

 

The Evolution of the Infrastructure

In the rush to bring servers closer to users, we witnessed the birth of the sub-millisecond Internet, with caches located directly at ISP edge frontiers. Figure 7 shows the cumulative distribution function of the Round Trip Time (RTT) from the client to the server for YouTube flows. While in 2014, 80% of flows were served by a cache about 3ms far, in 2017, YouTube pushed its server even closer to end users; now 30% of traffic is served by a cache that is just 0.3ms far!

 

Figure 7: CDF of RTT in 2014 and 2017 for YouTube

 

Trends Help to Plan for Future Usage

By processing large-scale and longitudinal measurements from a national-wide ISP from 2013 to 2017, we characterised the traffic consumption of broadband subscribers, and the infrastructure web services deployed to reach customers.

We believe the figures we presented in this post are important to researchers, ISPs and even web service providers to better understand and better plan for the liveliness of the Internet.

 

Danilo Giordano was a RACI fellow and presented his research at the RIPE 78 meeting in May 2019.

You may also like

View more

About the author

Danilo Giordano Based in Turin

His research interests are focused on data analysis in small data and big data environments exploiting statistical and machine learning techniques. In particular, his interests are related to network measurements, and experiments meant to analyse network data and possibly highlight anomalies. In addition to the networking measurements, he is currently involved in a project to study future evolutions of car sharing mobility in smart cities. He actively participated in 7 conference papers and 2 journal papers, being awarded with the best students paper award at the ITC 2015 conference, and the IETF Applied Networking Research Prize 2017. He also spent two periods abroad in the CAIDA research center located in the UC San Diego university and Narus Inc. a company with headquarter in the San Francisco Bay Area.

Comments 2