“Reverse” Anchoring?

What?

There are many instances where we want guest traffic to not touch our enterprise networks.  From a security standpoint, having guest traffic quarantined off into a non-routed VLAN, and terminating the traffic into a DMZ provides a secure method for handling all of this untrusted traffic.

What if we could pickup guest traffic (no matter where it is) and tunnel it to a single place like our DMZ?  You can – this is exactly what the Anchor WLC does for us.

By having the Anchor WLC live out in the DMZ, we are able to build a EoIP tunnel between our Foreign WLC and the Anchor WLC. This serves as a mechanism to securely transport traffic from the AP, all the way to the DMZ – all while never gaining visibility into the rest of the network.

1

How?

In Cisco world, there are 2 main types of tunnels.  The first one is CAPWAP which is the tunneling mechanism used between APs and WLCs for Control and (sometimes) Data traffic to ride in. The second one is Ethernet Over IP (EoIP) that the WLCs use to communicate between each-other. This is the logical underpinning that allows WLCs to share information such as client and AP data, and overall just enables the WLCs to be “aware” of each-other.  We build these EoIP tunnels between WLCs to enable seamless roaming of clients between WLCs, and even to enable L3 roaming between WLCs.

Another great feature of the EoIP tunnels is that it allows us to take an SSID that is configured on our local (it’s actually called “Foreign” – but whatever) WLC, and terminate it on another WLC. This provides great flexability on what IP Network we actually want to terminate the SSID to – especially in the case of guest networks.

The way we form these WLC relationships is with something called a “Mobility Group”. A mobility Group is a bunch of WLCs that are “aware” of eachother and share information such as AP & Client statistics, and also allow us to terminate SSIDs onto a separate WLC.

Within the Mobility Group, messages are shared amongst the group members to enable features like seamless roaming, AP load balancing, Anchoring SSIDs, and fail-over support for APs. Every AP is aware of every WLC in the Mobility group and can failover to a neighboring WLC in the event of an outage (assuming the AP has L3 access to the remaining WLCs in the Mobility Group).

2

Every time a client associates to an AP or preforms a roam, the WLC sends a unicast message to each of the other Mobility Group members about this incident. As you can imagine, this can get EXTREMELY chatty when working with large scale deployments.  You can enable Multicast Messaging in these types of deployments where the WLC will send a message to the Multicast group, thus everyone in that Multicast Group will hear the message. This is the preferred method for large scale deployments as it reduces chatter and overall load on the WLCs.

Now that we’ve had an overview of what a Mobility Group is, where it’s used and why – lets start approaching it from the other direction.

“Standard” Anchoring

In Enterprise environments, we typically see a single Anchor WLC that lives out in the DMZ somewhere.  All of the Foreign WLCs will anchor their guest SSIDs to this WLC and life goes on as usual.

Regular Anchor

For starters, each WLC has its own “internal” Mobility Group that is defined in the configuration. When we form Mobility Group members, we point the OTHER WLCs at THIS Mobility Group.  This is why the Anchor is aimed at the Mobility Group “Anchor” and the converse is true of the foreign WLCs.

What you will notice here is that each of the Foreign WLCs only has the “Anchor” WLC in its  Mobility Group member list.  This means that WLCs: A, B, and C, all form an EoIP tunnel to only the Anchor WLC.  Even though all 4 WLCs are in the same Mobility Group, Mobility Messages ARE NOT shared between the 3 Foreign WLCs.

This typically works fine and life will go on. There however is only one problem with this design.

It doesn’t scale well.

As of the date of this blog, Cisco has both the Catalyst and AireOS Controllers on the market. The downside is that regardless of which platform you use, you are limited to the following;

  • A WLC may only have 24 Members PER Mobility Group defined in its Member List
  • A WLC may only have 72 entries in its Member List

The below picture is a sample of this very limitation.

24

You will notice in the list that we max out at 24 members of the “standard” Mobility Group before the WLC starts barking at us, and we have to move to a new Mobility Group name.

The issue here, is that all the Foreign WLCs are anchoring against the Anchor WLC, so what happens at WLC #25? Where does this WLC anchor?

The short answer is: we simply create another Mobility Group (ie: group name “standard1” in above picture) and start pairing Foreign WLCs to the Anchor in a new group.

This is a perfectly valid config and will work just fine.  After-all, not many networks have dozens of WLCs that they are Anchoring guest traffic back to a single place…right?

For those of us that are lucky enough to walk into these types of accounts, it provides a true head-scratching moment mostly around the following:

  • What if I have 60 sites I need to Anchor to 1 or 2 WLCs? They can’t all live in the same Mobility Group afterall..
  • How do I maintain a common config across all my Anchors?
  • How do I address the Mobility Group naming issue while staying on a standard?

“Reverse” Anchoring

This is where “Reverse” Anchoring can help migrate around these headaches.  The only thing we are REALLY changing, is there is no longer a shared mobility group that we will Anchor against.  This solves a few of my nagging OCD points:

  • Using Unique, Foreign-Specific Mobility Groups, we will never approach the 24 Members per 1 Mobility Group Limit
  • It maintains congruence of configs if you are utilizing multiple Anchors for Redundancy.

Redundancy

While we are at it, lets take a look at what our options really are for “Anchor Redundancy”.

  1. Anchor WLCs can be deployed as an SSO pair to give you box level redundancy
  2. Additional Anchor WLCs can be deployed as a standalone WLC for failover & client load balancing

Reverse Anchoring - Redundnat

For my moneys worth, I don’t see any real value in having an SSO pair on your Anchor WLCs. For the exact same amount of hardware and licensing, you can stand up a secondary Anchor.  By having 2 discrete anchors, you have the ability to scale up your guest counts, and still are able to achieve fail-over redundancy.  By setting your Anchor priority values the same on your Foreign WLC, the clients will round-robin between the two WLCs.  This not only gives your greater scale to the number of clients you can anchor, but it also provides non-statefull fail-over should one of the Anchors go down.

This was written on 3/25/2020 while quarantining at home during the COVID-19 Pandemic.  I finally had some time so sit down during my quarantine and put all of this on paper, as it’s been bouncing around my brain a lot lately..  I hope everyone is staying safe and healthy, cheers!

C9800-CL on Windows

Introduction

After the release of the new Catalyst 9800 Controller, I have been wanting to really get my hands on one to have for my home lab.  My biggest hurdle is that I am in a Windows only environment and don’t have VMWare at my disposal.

I built a pretty beefy gaming PC last year that I use a lot for work, and I have been tinkering with getting the C9800-CL VM running in my environment for a while now.   It wasn’t until this morning that I FINALLY got it working.

The components that I used to finally get this working are;

  • Installed the Oracle VirtualBox Freeware
  • C9800-CL .ISO File
  • Windows 10 PC with an Ethernet Interface

Step 1:  Downloading the WLC Image

For my setup, you will need to obtain the .ISO image of the new C9800 WLC.

Picture

This will require that you have a valid CCO account with the appropriate permissions for access to the files.

 

Step 2:  Create the VM inside of VirtualBox

The VM will need 8GB of Virtual Hard Disk (VDI) space that is Dynamically Allocated, 1 CPU deducated, and 4096 MB of RAM.

1 - vm setup

 

Once you have provisioned the VM, you need to select the Optical Drive settings, and select your C9800-CL .ISO image file

7

 

10

 

The next part is the MOST important and after many variations – I have settled on the below network settings as it enables the VM to function properly

Under the VM settings, ensure that your Adapter 1 is the adapter you use to connect to your network with (mine is Eth 0).  You will need to ensure that it is set to your bridged adapter, and the advanced type MUST be virtio-net

11

Step 3:  Launch the VM

Start the VM and it will launch the .ISO file. Aside from the very start when it says “press any key” – you won’t need to touch the keyboard.  This is a great time to go grab a cup of coffee.

 

Step 4: Initial Configuration

Setting up the WLC via CLI is much easier than via the GUI, and it also allows you to get around some of the odd traps that the Day-0 provisioning GUI will force on you. François Vergès wrote an awesome blog around how he preformed this.  I have shamelessly copied his last section into this section of the blog.

 

Start by terminating the auto install so that it drops you down into the WLC CLI

WLC14

From here you will need to configure the following;

  1. Configure the Enable Password
  2. Create an Admin Account
  3. Configure the Network Interface g1
  4. Configure the default route
  5. Configure the Country Code (this is required to avoid the Day – 0 Provisioning)
  6. Configure which interface will be used for management (g1 for our case)
  7. Generate the Certificate that will be used to establish DTLS connections with the APs

Use the below commands in order to configure these items

WLC15

Notes:

  • The IP addresses used here are specific to my setup. Ensure you use relevant IPs to your network.
  • The passwords have not been disclosed, please replace “secret_password” and “user password” by the passwords you want to use
  • Configure these items in the order outlined in this blog
  • The last command doesn’t configure anything, it’s just used to validate that the trustpoint has been generated properly 
  • Since we are disabling the 802.11a and 802.11b radios to configure the country code, you will have to re-enable them later if you want your APs to be operational

 

From this point, you should be able to ping your WLC VM, as well as browse to it and login to the GUI with the credentials that you selected.   Good luck and Happy New Year to everyone!

 

Resources

SHAtastic “Features”

Over the past two weeks, I have been working on a deployment that “seemed” pretty straight forward.

  • Client has 250 APs in autonomous mode to be converted to Flex Connect
    • The motivation here is due to the APs being deployed across the globe
    • This sounds like a perfect use case for a vWLC
    • APs are a mix of 1142, 1242, & 2702

Sounds pretty cut & dry right? All we have to do is find a code rev that supports all the different AP models, and we should be good to go…

The saga started by deploying the 8.0.152.0 .ova into the environment – easy peasy.

The APs from this decade (2702) joined right up, no problem at all. The REAL fun started when we tried to join the old 1242s to the vWLC. At this point, I was seeing an error from my test AP that read something like this;

“*Nov 11 18:07:36.000: %CAPWAP-5-DTLSREQSEND: DTLS connection request sent peer_ip: x.x.x.x peer_port: 5246
*Nov 11 18:07:36.033: Failed to get CF_CERT_ISSUER_NAME_DECODEDPeer certificate verification failed 000B
*Nov 11 18:07:36.038: %CAPWAP-3-ERRORLOG: Certificate verification failed!
*Nov 11 18:07:36.038: DTLS_CLIENT_ERROR: ../capwap/base_capwap/capwap/base_capwap_wtp_dtls.c:447 Certificate verified failed!
*Nov 11 18:07:36.038: %DTLS-5-SEND_ALERT: Send FATAL : Bad certificate Alert to x.x.x.x:5246
*Nov 11 18:07:36.039: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to x.x.x.x:5246
*Nov 11 18:07:36.040: %CAPWAP-3-ERRORLOG: Invalid event 38 & state 3 combination.”

whales

I couldn’t for the LIFE of my figure this one out. So after 2 hours on the phone with TAC we found out an awesome bug feature. It turns out that for whatever reason, the old APs didn’t like MIC certificate that came native with the 8.0.152.0 vWLC. The work around is that we have to deploy an older (8.0.121.0) vWLC model, and then we can upgrade from there. It has something to do with 8.0.121.0 vWLC having a MIC certificate that the old APs actually can play nicely with.

Fine. I’ll just get TAC to publish this older vWLC to me (as I can’t download it on CCO because its redacted) and we’ll deploy it in the environment – seems straight forward enough.

So we successfully deployed the 8.0.121.0 vWLC, and now the old 1242 is fussin’ at me with the following;

The AP logger will show messages similar to the following:

*Oct 29 18:01:56.107: %PKI-3-CERTIFICATE_INVALID_EXPIRED: Certificate chain validation has failed.
The certificate (SN: 7E3446C40000000CBD95) has expired. Validity period starts on 14:38:08 UTC Oct
26 2021 Peer certificate verification failed 001A

*Oct 29 18:01:56.107: DTLS_CLIENT_ERROR: ../capwap/base_capwap/capwap/base_capwap_wtp_dtls.c:496
Certificate verified failed!
*Oct 29 18:01:56.107: %DTLS-5-SEND_ALERT: Send FATAL : Bad certificate Alert to 192.168.10.10:5246
*Oct 29 18:01:56.107: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to 192.168.10.10:5246

On the WLC side, you will only see a message like this:

*osapiBsnTimer: Oct 29 11:05:04.571: #DTLS-3-HANDSHAKE_FAILURE: openssl_dtls.c:2962 Failed to complete DTLS handshake with peer 192.168.202.8

 

Weeeeee! So now I get to reengage TAC and ask them what all this nonsense is about. It turns out that if you deploy a vWLC, it will start the MIC cert validity period to something like 8 hours AFTER the vWLC comes online. (Bug ID: CSCuq19142). This means that the NTP time on the vWLC, is before the MIC certificate becomes valid. This means that APs won’t be able to join..

picard-facepalm

So the workaround? For the first day or so the vWLC is online, you outright lie to the vWLC about what the time is.  I just changed the “year” field to 2019 – nothing like living in the future! *Note* I had to delete any NTP server configured on the WLC before the manual time change took effect.wlcTime

 

From here, the AP joined up just fine and behaved normally. I was also able to upgrade to 8.0.151.0 without issue, because I started with a “correct” vWLC version. After 24 hours, I was able to sync the vWLC back to NTP as the MIC validity “start” time was sometime late last night.

Lots of us will ask “why do these folks have APs from last decade” and the answer is real simple – money.  Why would a company go out and replace a ton of equipment, that isn’t broken? If one dies, they can just replace it with a new one – all we have to do is ensure the vWLC can support both old AND new equipment. I’ve ran into the same exact issue with one of the worlds largest airlines as well – why fix something that ain’t broke?

main-qimg-7f8822932633531e5a74773d61f5d6df-c

 

Now that we have everything up and running, I certainly learned a lot from all of this. Most of it doesn’t make a whole lot of sense as to why they happen (ie; the SHA cert start date being set to some arbitrary value), but at the end of the day – as long as it’s all working – nobody really cares how you got there.

There are many ways to get to 5. Is 4+1 better than 2+3? And more importantly – the client/business owners don’t really care.