Friday, September 13, 2019

Cisco FN70330 upgrade results

In April 2019 Cisco announced FN70330

Cisco field notice 70330
https://www.cisco.com/c/en/us/support/docs/field-notices/703/fn70330.html

This field notice involved an issue with the AP flash memory getting corrupted over time.  This affected many of the older Cisco AP platform prior to the x800 series To verify if your AP's were affected you needed to SSH or Telnet to each AP and run flash commands that are outlined in the notice.

Details from the notice of the various bugs:

Defect Information

Defect IDHeadline
CSCvk15043Wave 1 APs - AP radio FW image install failure in the bootup loop
CSCvk15068IOS APs, recovery logic for failure on primary Image
CSCvk26732New Flash recovery logic
CSCvm33617Configuration file should not be modified due to low flash memory
CSCvf16302Flash on lightweight IOS APs gets corrupted
CSCvf28459Write of the Private File nvram:/lwapp_ap.cfg Failed on compare RCA needed (try = 1)


Referenced in the workaround section of the notice notice there is a companion article:

Cisco Article - Understanding Various AP-IOS Flash Corruption Issues
https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/213317-understanding-various-ap-ios-flash-corru.html

In this 2nd article they present a "wlanpoller" script with installation instructions for MAC and PC.  This script automated the connections to all of your AP's and has the ability to recover certain AP's that is determines it can recover.  It also gives you a .csv report to help you see AP's that are currently having this issue.

Please note that this issue is a moving target.  The real solution here it to get off of the offending code/platform combination.

I had a wlan area of concern the was on an affected HA wlan controller pair that services critical inpatient areas of hospital.  I also had a non HA pair (N+1) in another hosptial.  The HA pair seemed to have a lot of flash cooruption issues found by the wlanpoller while the non HA pair hospital did not show any.  Not sure why that is but it was my results.  Both were running same code and same model mix of AP's.

In areas that I had this issue my goal was to cure the AP's or at least try to identify the AP's with issue so I can take action prior to the upgrade.  In certain cases if you attempt upgrade and the flash is corrupted there are chances that the AP becomes stranded and would require you to have direct access to the AP and/or replace the AP with a working unit while you recover the failed AP.  In  critical inpatient area access to rooms are difficult and outages can affect critical patient safety system so a conservative and careful approach is always best.

I have run the wlanpoller multiple time on multiple controllers and the list of affected AP's does change over time.  Just because an AP is on the list one pass does not mean it will be on the list the next pass.  This did not help me in my attempt to control the possible bad outcome of an AP becoming stranded.  As I said any AP's you may "fix" while running this bad version code others may then come forward getting affected by this but.  I really tried to clear the list of AP's showing "zero" flash by rebooting these units one at a time prior to the controller code upgrade.  Not sure if flash showing "zero" were the primary prospects leading to stranded AP.  I opened a case with Cisco TAC to try to better control and/or define the issue for a sure positive outcome but they were of no help.  Mainly because the bug was in the controller code and AP model combination and continues at all times.  So I needed to upgrade to true truly find out what my results would be.  Prior to starting this I made sure I had replacement inventory for any AP's that fails so we can get network staff with replacement equipment on hand to replace any failed AP's.

My results:

1 percent of AP's fail but fully recoverable by manually rebooting the power at the POE switch port.

1 percent of AP's fail, recoverable by manually rebooting the power at the POE switch port but after they returned to service they were at default configuration so needed to get reconfigured.

1 percent of AP's failed in the abandoned state that required staff to replace with at working unit.

I hope my experience with navigating this Cisco field notice helps you in making your decisions moving forward with your upgrade.  This required a lot of attention and in my case a plan to respond for the failed devices.

Please note this was an upgrade of a WISM2 HA pair servicing mostly x600 and x700 model Cisco AP's.





No comments:

Post a Comment