Over the past two weeks, I have been working on a deployment that “seemed” pretty straight forward.
- Client has 250 APs in autonomous mode to be converted to Flex Connect
- The motivation here is due to the APs being deployed across the globe
- This sounds like a perfect use case for a vWLC
- APs are a mix of 1142, 1242, & 2702
Sounds pretty cut & dry right? All we have to do is find a code rev that supports all the different AP models, and we should be good to go…
The saga started by deploying the 18.104.22.168 .ova into the environment – easy peasy.
The APs from this decade (2702) joined right up, no problem at all. The REAL fun started when we tried to join the old 1242s to the vWLC. At this point, I was seeing an error from my test AP that read something like this;
“*Nov 11 18:07:36.000: %CAPWAP-5-DTLSREQSEND: DTLS connection request sent peer_ip: x.x.x.x peer_port: 5246
*Nov 11 18:07:36.033: Failed to get CF_CERT_ISSUER_NAME_DECODEDPeer certificate verification failed 000B
*Nov 11 18:07:36.038: %CAPWAP-3-ERRORLOG: Certificate verification failed!
*Nov 11 18:07:36.038: DTLS_CLIENT_ERROR: ../capwap/base_capwap/capwap/base_capwap_wtp_dtls.c:447 Certificate verified failed!
*Nov 11 18:07:36.038: %DTLS-5-SEND_ALERT: Send FATAL : Bad certificate Alert to x.x.x.x:5246
*Nov 11 18:07:36.039: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to x.x.x.x:5246
*Nov 11 18:07:36.040: %CAPWAP-3-ERRORLOG: Invalid event 38 & state 3 combination.”
I couldn’t for the LIFE of my figure this one out. So after 2 hours on the phone with TAC we found out an awesome
bug feature. It turns out that for whatever reason, the old APs didn’t like MIC certificate that came native with the 22.214.171.124 vWLC. The work around is that we have to deploy an older (126.96.36.199) vWLC model, and then we can upgrade from there. It has something to do with 188.8.131.52 vWLC having a MIC certificate that the old APs actually can play nicely with.
Fine. I’ll just get TAC to publish this older vWLC to me (as I can’t download it on CCO because its redacted) and we’ll deploy it in the environment – seems straight forward enough.
So we successfully deployed the 184.108.40.206 vWLC, and now the old 1242 is fussin’ at me with the following;
The AP logger will show messages similar to the following:
*Oct 29 18:01:56.107: %PKI-3-CERTIFICATE_INVALID_EXPIRED: Certificate chain validation has failed.
The certificate (SN: 7E3446C40000000CBD95) has expired. Validity period starts on 14:38:08 UTC Oct
26 2021 Peer certificate verification failed 001A
*Oct 29 18:01:56.107: DTLS_CLIENT_ERROR: ../capwap/base_capwap/capwap/base_capwap_wtp_dtls.c:496
Certificate verified failed!
*Oct 29 18:01:56.107: %DTLS-5-SEND_ALERT: Send FATAL : Bad certificate Alert to 192.168.10.10:5246
*Oct 29 18:01:56.107: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to 192.168.10.10:5246
On the WLC side, you will only see a message like this:
*osapiBsnTimer: Oct 29 11:05:04.571: #DTLS-3-HANDSHAKE_FAILURE: openssl_dtls.c:2962 Failed to complete DTLS handshake with peer 192.168.202.8
Weeeeee! So now I get to reengage TAC and ask them what all this nonsense is about. It turns out that if you deploy a vWLC, it will start the MIC cert validity period to something like 8 hours AFTER the vWLC comes online. (Bug ID: CSCuq19142). This means that the NTP time on the vWLC, is before the MIC certificate becomes valid. This means that APs won’t be able to join..
So the workaround? For the first day or so the vWLC is online, you outright lie to the vWLC about what the time is. I just changed the “year” field to 2019 – nothing like living in the future! *Note* I had to delete any NTP server configured on the WLC before the manual time change took effect.
From here, the AP joined up just fine and behaved normally. I was also able to upgrade to 220.127.116.11 without issue, because I started with a “correct” vWLC version. After 24 hours, I was able to sync the vWLC back to NTP as the MIC validity “start” time was sometime late last night.
Lots of us will ask “why do these folks have APs from last decade” and the answer is real simple – money. Why would a company go out and replace a ton of equipment, that isn’t broken? If one dies, they can just replace it with a new one – all we have to do is ensure the vWLC can support both old AND new equipment. I’ve ran into the same exact issue with one of the worlds largest airlines as well – why fix something that ain’t broke?
Now that we have everything up and running, I certainly learned a lot from all of this. Most of it doesn’t make a whole lot of sense as to why they happen (ie; the SHA cert start date being set to some arbitrary value), but at the end of the day – as long as it’s all working – nobody really cares how you got there.
There are many ways to get to 5. Is 4+1 better than 2+3? And more importantly – the client/business owners don’t really care.