BGP Route Flap Damping (RFD) and its use has been a controversial topic in the past. Recommendations have been revised multiple times over the past two decades and still differ from vendor default values. In this article we dive into how we measured Route Flap Damping in the real-world, uncover which configurations are in use, and provide RFD parameter sets based on past recommendations for your router.
BGP connects Autonomous Systems (ASes) in the Internet by announcing (and withdrawing) routes. To prevent oscillating routes, Route Flap Damping was introduced in 1998, which suppresses repeating BGP updates. Route flaps cause performance problems on routers, but RFD as a mitigation technique might also suppress well-behaved or stable routes and, in the worst case, leads to unreachability of networks. Because of these drawbacks, the common belief is that RFD is not widely deployed.
History
A variety of different recommendations on the use of RFD existed in the past. Figure 1 attempts to summarise the key events of the last 25 years. RFD was introduced in the mid-nineties and standardised in RFC 2439 in 1998. The community quickly noticed that RFD can cause unreachability issues during the convergence process of updates. Mao et al., proved these observations later and RIPE consequently recommended to disable RFD in 2006 (ripe-378).
In 2011, Pelsser et al., suggested slight adjustments to the previously recommended RFD configuration, thus making RFD usable without the need to adjust vendor implementations. Based on their findings, RIPE (ripe-580) and the IETF (RFC 7454) published recommendations to use RFD with adjusted parameters. The harmful vendor default parameter were not revised.
Figure 1: Timeline of Route Flap Damping
In all this time there has been no study attempting to measure real-world deployment and configuration of RFD. Understanding the configurations network operators use in practice is crucial for Internet operations and measurements.
The Route Flap Damping Signature
An RFD-enabled router maintains a penalty value per prefix per BGP session that defines when a prefix should be suppressed or released. This value is additively increased with each announcement or withdrawal for that prefix, and decreases exponentially over time. When it exceeds a threshold the prefix is suppressed until the penalty decays below a second threshold.
Figure 2 visualises how the penalty for one of our prefixes behaves in an RFD-enabled router. The router starts to receive updates at t0 for the prefix and additively increments the penalty. At t1 the penalty is larger than the suppress-threshold and therefore the prefix is withdrawn from peers and any further received updates will not be propagated. At t2 the router no longer receives updates for the prefix and therefore the penalty can reduce below the reuse-threshold at t3. The reuse-threshold defines when a prefix is considered usable again. As a result the router re-advertises the prefix to its peers.
Figure 2: RFD router perspective: The penalty for a prefix that oscillates between announcement (green) and withdrawal (orange)
Measurement Infrastructure: Rapid Beacons to Trigger RFD
The re-advertisement (t3 in Figure 2) would not occur if the router continues to receive updates at a sufficiently quick rate. Therefore, in our experiment, we are announcing and withdrawing Beacon prefixes in a so-called Burst and Break pattern (light-blue and white bands in Figure 2). In Bursts we begin with a withdrawal, alternate between announcement and withdrawal, and end with an announcement. In Breaks we do not send any updates.
The update interval between two consecutive updates in the Burst determines which kind of RFD configuration is triggered. We did not expect configurations more strict than the vendor default values, which already suppress 14% of all prefixes. A Juniper or Cisco router would start damping a prefix that flaps at least every 9 or 8 minutes respectively. MRAI limits us to go much lower than 1 minute, because 30 seconds is the Cisco default, and other routers are probably similarly configured. We used 1, 2, and 3 minutes as update intervals with a 6 hour Break in our first experiment and 5, 10, and 15 minutes with a 2 hour Break in our second measurement.
To achieve a great variety in path data we announce our Beacons from seven different locations in the world, namely: Bangkok, Johannesburg, København, München, São Paulo, Seattle, and Tokyo. To pick up the Beacon pattern that has been altered by RFD-enabled routers, we use three route collector projects Isolario, RIPE RIS, and RouteViews. We refer to peers of route collector projects as vantage points.
To detect RFD we interpret all received updates for each AS path. We can decipher only the announcement pattern, because withdrawals do not contain AS paths. Figure 3 and 4 visualise what we observe at the vantage point 137.39.3.55 for two AS paths. The upmost axis reflects exactly when we receive announcements for the given path from the vantage point. The two axis below depict when updates were sent from the Beacon router and whether they were received or not.
Figure 3 clearly shows the RFD signature sketched in Figure 2, whereas in Figure 4 almost all announcements were exported by the vantage point and not damped. This means we can infer that at either 701 or 2914 uses RFD (3130 is Beacon AS).
Figure 3: RFD Signature. At least one AS on the path (701, 2497, 3130) has RFD enabled.
Figure 4: Non-altered Beacon pattern. RFD does not occur on this path.
Based on whether we can observe the RFD signature we label paths with RFD true or false. Although the resulting dataset gives an idea of RFD deployment, we want to know exactly which AS is damping. With that aim we face the challenge that an non-negligable share of ASes uses RFD selectively, e.g., suppresses only churn from customers. On top of the usual measurement noise, selective damping entails a contradicting dataset.
We developed three heuristics to determine which AS deploys RFD on the Internet. We will very briefly introduce them: The first heuristic simply computes the relative occurrence of an AS on damped and non-damped paths. The second method relies on alternative paths that are announced after the damped path has been suppressed. The last heuristic uses the characteristic that RFD-enabled ASes export on average less updates towards the end of Bursts.
Real-World Deployment and Configurations
At this time there are two relevant parameter sets: vendor default values and recommendations by the IETF (BCP-194) and RIPE (ripe-580). These are displayed in Table 1. Default parameters have proven to be harmful in the past (Mao et alia), because they can lead to reachability issues, hence the difference to the recommended parameters.
RFD parameter | Cisco | Juniper | BCP 194 / RIPE-580 |
---|---|---|---|
Withdrawal penalty | 1000 | 1000 | 1000 |
Re-advertisement penalty | 0 | 1000 | 0/1000 |
Attributes change penalty | 500 | 500 | 500 |
Suppress-threshold | 2000 | 3000 | 6000 |
Half-life (min) | 15 | 15 | 15 |
Reuse-threshold | 750 | 750 | 750 |
Max suppress time (min) | 60 | 60 | 60 |
Table 1: Vendor default parameter and recommendations
We used six different update intervals in our experiment: 1, 2, 3, 5, 10, and 15 minutes. Although Cisco and Juniper have different suppress-thresholds and penalty increments with re-advertisements, both start damping at the 5 minute update interval. Figure 5 shows the number of damping ASes that we identified for each update interval. There is a clear increase visible from 10 to 5 minutes. This shows that many ASes use harmful vendor default values. This observation is confirmed with ground truth where 60% are using vendor defaults. The slight increase for the smaller update intervals is likely caused by some network operators following the recommendations. The ASes damping at the larger update intervals are likely also using vendor default values, but receive more updates than we send from the Beacon routers due to topology phenomena, and therefore dampen the low-frequency Beacon prefixes.
Figure 5: Number of Damping ASes for each update interval. Total measured ASes: 610
We contacted network operators to validate our results and found 95% precision with one false positive.
Recommended RFD Parameter Sets
In the network operator community there seems to be much confusion about how to apply current recommendations correctly. As a result some operators just use the default values supplied by vendors because they do not expect vendors to ship harmful configurations. Therefore, we chose to supply exact configuration parameters in Table 1.
It is worth noting that these values are based on previous measurements (RIPE-580) and might need to be updated based on recent data. This will be part of our future work.
Conclusion
In contrast to the expectation of the networking community we found that RFD is in use by at least 8% of the measured ASes. Tier 1 providers as well as small ISPs deploy RFD and most of them use deprecated, harmful vendor defaults. Please consider updating and checking your configuration. We will report about our on-going work on http://rfd.rg.net/.
This research was presented at RIPE 80.
Comments 0
Comments are disabled on articles published more than a year ago. If you'd like to inform us of any issues, please reach out to us via the contact form here.