Jac Kloots

RPKI Routing Policy Decision-Making - a SURFnet Perspective

Jac Kloots

11 min read

0 You have liked this article 0 times.
0

Before we decide to implement a stricter operational practices, we needed to find out how many routes with invalid origins are actually being used and how much traffic is exchanged using those routes. Please find the results below.


Current setup

SURFnet has been working on RPKI for a while. One of the results of this is the RPKI dashboard [1]. This dashboard gives an overview of the use of RPKI looking at routes and their validation state.

Since some time we have implemented the RPKI validation software and are using this data in our production routers. All routes we learn from our external BGP sessions are labeled with the validation state. Customer prefixes aren’t labeled.

SURFnet RPKI import groups Figure 1: SURFnet RPKI import groups

Figure 1 above shows SURFnets current routing policy (you can enlarge all image by clicking on them). We always prefer routes learned from customers above all external connectivity. For external connectivity we prefer routes learned from research networks above private peerings followed by IX peerings and as a last resort we have two global upstream connections.

All these connections result in a BGP table with currently approximately 478,000 active IPv4 routes. From these routes approximately 2,560 routes are marked invalid, 18,800 are valid and 456,000 are unknown or not found.

Policy decision making

For RPKI to make any sense, the RPKI origin validation operational draft [2] suggests to drop (or at least lower the preference for) all invalid routes, and to prefer routes marked valid above the unknown routes.

Currently SURFnet is not adhering to this recommendation. Before doing this we needed to have more detailed information and have confidence that we won’t introduce major problems by filtering on RPKI validation states. By using the RPKI system we now know the routes and origin Autonomous System Numbers (ASNs) from which we receive routes with invalid origins. But what is going on there? How much traffic is it and what kind of traffic?

Using a measurement tool from Deepfield [3] and using BGP communities we are able to measure traffic for each of the different validation states (valid, invalid and unknown origins).

Daily total traffic breakdown table Figure 2: Daily total traffic breakdown table

As most routes in the routing table are of unknown origin and the smallest number of prefixes are of invalid origin, total traffic levels are distributed as expected.

But can we drop this small amount of traffic? Would customers start complaining if we would apply the routing policy to drop invalid origins and adhere to the operational practices as described in the IETF draft? For that we needed to know more details about the invalid origin traffic.

Using the capabilities of the Deepfield system we were able to classify traffic from invalid origins into groups.

Distribution of invalid traffic Figure 3: Distribution of invalid traffic

The picture above clearly shows that most traffic is classified as ‘grid’ and ‘research’ traffic. If we dive deeper into the 'grid' traffic we see multiple origins as shown in Figure 4 below.

Grid traffic invalid origins Figure 4: Grid traffic invalid origins

The top players here are Renater with AS2200 and NORDUnet with AS39590.

So can we drop these invalids? Using the RPKI dashboard it is visible what the problem is:

AS2200 RPKI Dashboard overview Figure 5: RPKI Dashboard overview for AS2200

There is a prefix-length mismatch for a lot of /24s. According to the route origin authorisation (ROA), there is a covering route [4], but does this covering route also exist in the routing table? For this particular example it does:

 > show route protocol bgp 193.48.0.0
 

193.48.0.0/14      *[BGP/170] 5w4d 17:46:51, localpref 100
AS path: 1103 20965 2200 I, validation-state: valid

This route has the same Origin AS, has a valid origin and is even following the same AS path as the more specific but invalid /24. Dropping these routes with invalid origin would cause no problems whatsoever.

Does this also count for the other 2,500 routes with invalid origins? Do all these routes have covering routes from the same origin AS?

After investigating this we found 2,564 routes with an invalid origin and from these routes we found 1,856 (72%) covering routes with a valid origin.

The interesting thing is that from these 1,856 covering routes 1,248 (67%) have an origin AS similar to the invalid route they cover and 608 (33%) covering routes have a different origin AS.

# AS Invalid Covering
1 6147 608 552
2 237 174 184
3 23752 122 123
4 15557 38 85
5 7303 1 74
6 2200 1 56
7 39501 53 53
8 23383 50 50
9 9873 48 48
10 286 41 35

Table 1: Top 10 origin ASes with covering routes

Some of these origin ASes have more covering routes than invalid routes. This means they are not only announcing covering routes for their own routes with invalid origin but also announce routes that are covering routes with invalid origin from other ASes.

For example AS237 is covering 184 routes with invalid origin, while AS237 only announces 174 routes with an invalid origin. Ten of the announced routes by AS237 are thus covering routes announced by other ASes.

AS237 is covering routes from AS40044 and AS25773.

RPKI Dashboard Figure 6: RPKI Dashboard overview of AS237

If we’d drop the routes with invalid origin announced by AS40044 and AS25773, AS273 would attract their traffic and we probably won’t reach the destination we were trying to reach.

So if we look at the total traffic using routes with invalid origin, we'll get the following results:

Total traffic exchanged using routes with invalid origin Figure 7: Total traffic exchanged using routes with invalid origin

The biggest amount of traffic is exchanged with AS1942. From AS1942 we only receive one route with invalid origin, but is covered by a route from AS2200.

When we dive deeper into this route:

 193.48.83.0/24     *[BGP/170] 2w0d 00:56:08, localpref 100
 
AS path: 1103 20965 2200 1942 I, validation-state: invalid

We see that AS2200 is the transit AS for AS1942. Using the looking glass of AS2200 we see that we probably would be able to reach the destination with the covering route with valid origin:

 193.48.0.0/14      *[BGP/170] 11w1d 07:25:52, localpref 100
 
AS path: 1103 20965 2200 I, validation-state: valid

For the second biggest AS with whom we exchange traffic the game is different, we receive only one route from them:

 109.105.124.0/22   *[BGP/170] 10w6d 21:50:35, localpref 100
 
AS path: 1103 2603 39590 I, validation-state: invalid

And this route isn’t covered by a route with a valid origin. So dropping this invalid route would cause the prefix(-es) to become unreachable. This is clearly caused by missing ROAs for the AS39590 routes.

We could continue doing this for all the ASes mentioned. The top 10 would then look as follows:

# AS Invalid Covered Covered By
1 1942 1 1 2200
2 39590 1 0 0
3 2200 1 1 2200
4 789 2 2 2200
5 35017 24 24 35017
6 31042 8 8 31042
7 6830 3 3 6830
8 3320 6 6 3320
9 15802 6 6 15802
10 36947 1 1 36947

Table 2: Top 10 total traffic ASes and their covering routes

All prefixes from these ASes are covered, except the one from AS39590. There will be no traffic exchange possible anymore if we drop routes with invalid origin from AS39590.

A route with origin AS2200 covers the routes from AS789. Again if we would drop the routes with invalid origin from AS789 the traffic exchange could still continue using AS2200

Routes with invalid origin:

 193.48.99.0/24     *[BGP/170] 6w1d 05:33:16, localpref 100
 
AS path: 1103 20965 2200 789 I, validation-state: invalid
 
193.48.100.0/24    *[BGP/170] 6w1d 05:33:16, localpref 100
AS path: 1103 20965 2200 789 I, validation-state: invalid

And their covering routes:

 193.48.0.0/14      *[BGP/170] 11w1d 08:17:07, localpref 100
 
AS path: 1103 20965 2200 I, validation-state: valid

Conclusion

Can we use RPKI and adhere to the operational practices by dropping routes with invalid origins? We tried to answer this question by investigating how many routes with invalid origins are actually being used and how much traffic is exchanged using those routes.

Using traffic measurements we tried to determine on which routes with invalid origin we had to focus. And we had to determine whether traffic could still be exchanged using alternative routes with a valid origin.

Looking at the results from this research we are able to say that we could adhere to a strict routing policy by dropping routes with an invalid origin and not lose to many routes. Of course this is a result from research on SURFnet data from last week. Whether you can adhere to this strict policy also depends on your traffic patterns.

We will continue to monitor the validation state for routes and the traffic pattern for some more time to see if the conclusions from this research are still valid over time. If we have enough evidence we can safely drop routes with invalid origins we will start doing that before the end of this year.

 

[1] The SURFnet RPKI Dashboard

[2] The RPKI origin validation operational draft

[3] The Deepfield system

[4] Covering route: A less specific route exists for a certain more specific route

 

0 You have liked this article 0 times.
0

About the author

Jac Kloots is as a Technical Product Manager responsible for the SURFnet IP core and all external connectivity including peering. In his 10+ years at SURFnet Jac also worked on the design of the previous SURFnet networks and is now responsible for the SURFnet testteam and the technical development of the SURFnet network services.

Comments 0