IPv6-only at Microsoft
Microsoft has been running IPv6 in some fashion on its corporate network for many years now.
We have a substantial IPv6 footprint across more than 100 countries, but as you have probably guessed from the title of the post, dual-stack doesn’t meet our needs.
What is driving this push to IPv6-only?
The first – and most obvious – reason is exhaustion of IPv4 address space. The depletion of public IPv4 space is well-known, but Microsoft IT has exhausted almost all RFC1918 space. There are small discontiguous pockets remaining, but not enough for our current and future needs.
This situation has been exacerbated by two main factors: integration of corporate acquisitions (for example, Nokia) and Azure expansion. Even worse, these last factors have resulted in an overlap of RFC1918 space.
As interconnectivity is required between the corporate network and both Azure and the acquired company’s network, this has necessitated sub-optimal NAT solutions. These, of course, are potentially fragile – and certainly operationally challenging.
The second reason to consider a move to IPv6-only is the complexity of dual-stack. For helpdesk as well as network (and systems) operations staff, dual-stack more than doubles the complexity of dealing with issues. Equally, having to consider both IPv4 and IPv6 in the network engineering and design process, makes life more complicated than it needs to be.
With this in mind, during the last couple of years, we have been experimenting with IPv6-only.
How we have been experimenting with IPv6-only
We started out with a couple of test networks in Redmond, USA: one on the wireless guest network in one building, and a segment on the wired corporate network. This latter network was an opt-in. We configured these two networks with various address acquisition schemes (SLAAC, DHCP stateful and stateless) to get a picture of what worked and what did not.
Surprisingly, most scenarios worked. It was also interesting to gain experience with various NAT64/DNS64 platforms, which served as input into our shortlist for platform selection.
Following this initial pilot, we decided we needed to move aggressively to widen the deployment to more production networks. We elected to concentrate on the guest network as this gave us the most flexibility with the minimum of risk. We soon encountered a problem, however.
Testing has its challenges
The routers in buildings where we wanted to deploy did not support RDNSS in our current code. You’ll be aware RDNSS (RFC6106) is a facility to provide DNS resolver information in RAs. As we needed to support Android devices on our guest network, and Android doesn’t support DHCPv6, not having this feature was a problem.
Our temporary solution was to move the default gateway function to a centralized pair of routers that supported RDNSS. These routers are reached using a L2VPN overlay.
This isn’t ideal (operationally), so we are currently testing new code from the router vendor which supports RDNSS. This will allow us to move the default gateway back to the building. This highlights the fact that some IPv6 features are not consistent among platforms.
This change enabled IPv6-only on the guest network, but we have encountered two issues that have slowed us down. One vendor-based, and the other more home-grown.
The first is that our new wireless design allows employees to authenticate for network access using Azure AD. This, in turn, requires reauthorization ACLs on the wireless controller. These ACLs are dynamic (name-based) and are not supported with IPv6 by one of our vendors, and only by the second vendor in code, which we have not finished testing. Clearly, feature parity between IPv4 and IPv6 in vendors’ equipment can be difficult to achieve.
We expect to complete testing of both the wireless controller code and the router code supporting RDNSS in the next month or so, meaning we should be able to roll out IPv6-only to the guest network in our Redmond campus (100+ buildings) in the next six months.
The second issue slowing us down was a DHCPv6 bug in Windows 10. This affected both stateful and stateless schemes. Needless to say, IPv6-only expansion was impossible until we resolved this issue. We have reported it to the product group, and they are duly working on a fix.
The final bit of the puzzle is IPv6-only on the corporate network. This is complicated by the disparate requirements, which are different between regions, and even between buildings in the same city.
Specifically, this is about types of users – such as developers – having vastly different requirements to corporate functions such as HR or Finance. Technically, this means assessing whether NAT64 will break applications in use.
There’s also the question of NAT64 and DNS64 placement. This is complicated when we assess ideas such as SD-WAN (meaning no more than sites having local instead of centralized Internet egress).
In an environment such as this, centralizing functions such as DNS64 is easier to manage but less flexible. However, distributing the function allows more flexibility, but is more to manage. Issues such as these are what we will have to grapple with as we plan our move to IPv6-only in the corporate network.
So, is all this trouble worth it?
Hopefully. Migrating to IPv6 (dual-stack) is uncontroversial at this stage. But for us, moving to IPv6-only as soon as possible solves our problems with IPv4 depletion and address oversubscription. But it also moves us to a simpler world of network operations where we can concentrate on innovation and providing network services, instead of wasting energy battling with such a fundamental resource as addressing. It will be an interesting journey to get there!
Marcus Keane is Principal Network Engineer at Microsoft Corporation. This article was first published on the APNIC Blog.