Friday, September 13, 2019

Cisco FN70330 upgrade results

In April 2019 Cisco announced FN70330

Cisco field notice 70330
https://www.cisco.com/c/en/us/support/docs/field-notices/703/fn70330.html

This field notice involved an issue with the AP flash memory getting corrupted over time.  This affected many of the older Cisco AP platform prior to the x800 series To verify if your AP's were affected you needed to SSH or Telnet to each AP and run flash commands that are outlined in the notice.

Details from the notice of the various bugs:

Defect Information

Defect IDHeadline
CSCvk15043Wave 1 APs - AP radio FW image install failure in the bootup loop
CSCvk15068IOS APs, recovery logic for failure on primary Image
CSCvk26732New Flash recovery logic
CSCvm33617Configuration file should not be modified due to low flash memory
CSCvf16302Flash on lightweight IOS APs gets corrupted
CSCvf28459Write of the Private File nvram:/lwapp_ap.cfg Failed on compare RCA needed (try = 1)


Referenced in the workaround section of the notice notice there is a companion article:

Cisco Article - Understanding Various AP-IOS Flash Corruption Issues
https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/213317-understanding-various-ap-ios-flash-corru.html

In this 2nd article they present a "wlanpoller" script with installation instructions for MAC and PC.  This script automated the connections to all of your AP's and has the ability to recover certain AP's that is determines it can recover.  It also gives you a .csv report to help you see AP's that are currently having this issue.

Please note that this issue is a moving target.  The real solution here it to get off of the offending code/platform combination.

I had a wlan area of concern the was on an affected HA wlan controller pair that services critical inpatient areas of hospital.  I also had a non HA pair (N+1) in another hosptial.  The HA pair seemed to have a lot of flash cooruption issues found by the wlanpoller while the non HA pair hospital did not show any.  Not sure why that is but it was my results.  Both were running same code and same model mix of AP's.

In areas that I had this issue my goal was to cure the AP's or at least try to identify the AP's with issue so I can take action prior to the upgrade.  In certain cases if you attempt upgrade and the flash is corrupted there are chances that the AP becomes stranded and would require you to have direct access to the AP and/or replace the AP with a working unit while you recover the failed AP.  In  critical inpatient area access to rooms are difficult and outages can affect critical patient safety system so a conservative and careful approach is always best.

I have run the wlanpoller multiple time on multiple controllers and the list of affected AP's does change over time.  Just because an AP is on the list one pass does not mean it will be on the list the next pass.  This did not help me in my attempt to control the possible bad outcome of an AP becoming stranded.  As I said any AP's you may "fix" while running this bad version code others may then come forward getting affected by this but.  I really tried to clear the list of AP's showing "zero" flash by rebooting these units one at a time prior to the controller code upgrade.  Not sure if flash showing "zero" were the primary prospects leading to stranded AP.  I opened a case with Cisco TAC to try to better control and/or define the issue for a sure positive outcome but they were of no help.  Mainly because the bug was in the controller code and AP model combination and continues at all times.  So I needed to upgrade to true truly find out what my results would be.  Prior to starting this I made sure I had replacement inventory for any AP's that fails so we can get network staff with replacement equipment on hand to replace any failed AP's.

My results:

1 percent of AP's fail but fully recoverable by manually rebooting the power at the POE switch port.

1 percent of AP's fail, recoverable by manually rebooting the power at the POE switch port but after they returned to service they were at default configuration so needed to get reconfigured.

1 percent of AP's failed in the abandoned state that required staff to replace with at working unit.

I hope my experience with navigating this Cisco field notice helps you in making your decisions moving forward with your upgrade.  This required a lot of attention and in my case a plan to respond for the failed devices.

Please note this was an upgrade of a WISM2 HA pair servicing mostly x600 and x700 model Cisco AP's.





Wednesday, September 11, 2019

Experience as a Radio Engineer at One WTC

Every year on 9/11 I reflect back on my feelings on that day.  Watching the buildings burn and the impacts of those planes touch me personally.

I used to work and maintain radio equipment on One World Trade Center when I worked as an engineer for Southern New England Telephone's Paging system in NY,NJ at the time of the 1st bombing on February 26, 1993.


At One WTC we maintained a 72 MHz terrestrial radio link that fed data to all our paging transmitters in the NY,NJ area.  The data for this transmitter originated at our main paging terminal at 20 Exchange Place a few blocks away from the Trade Center buildings.

A week prior to the bombing our primary transmitter failed over to our secondary transmitter and I visited the site to investigate the issue to repair.  Yes real electronics as I used to do component level repair on radio circuits.  Trying to repair this equipment in the radio equipment area just below the roof of 1WTC is difficult and uncomfortable as it is very hot and lighting is very poor.  When you need tools or test gear it is about a 40 minute journey from the roof to the parking garage below 1WTC so after 1 or 2 attempts at making the repair I decided to pull the Primary unit and bring it home to my repair bench to better repair and burn in the unit after repair.

The next day I worked on the unit and replaced the components and aligned the drive for proper operation.

The following day I scheduled myself to do some system checks in the morning and after the peak NYC rush hour I would start my journey from my home in Northern NJ to One WTC hoping to have the unit installed and back in operation by just after lunch that day......Or that was the plan.....

While doing my system checks my wife told me she was not feeling well that day and there were signs of a few snowflakes coming down so I decided to push this replacement off another day rather then leave home and deal with snow in lower Manhattan.

I continued my system checks and had CNN on the TV in the background.  When the breaking news about smoke coming out of the bottom of the WTC interrupted the concentration on my work I was curious about what this issue was but at this point I was glad that I decided to delay my trip as this would have caused some issue getting near the building.

As details started to come out as to the extent of the damage and realization that this was the work of an explosion I continued to keep an eye on our system.  As a life long radio engineer I have always took my responsibility to keep systems operational very seriously and realized that thousands of people depend on my keeping things running especially in an emergency.

Later that night I received alerts that our WTC radio link lost AC power and was running on battery backup.  Time was very limited on the batteries and I was hoping this was a temporary power issue.  We were able to get contact with building personnel and were informed power and steam to the upper floors needed to be shut down due to the damage from the bomb damage.  When the batteries run down this would leave all of the greater NYC area out of service for our paging customers.

Realizing I had a good 72 Mhz link transmitter in my trunk I let my manager at SNET know that I had an option to keep the system running at some level.  I took the transmitter to our main paging terminal site at 20 Exchange Place.  20 Exchange Place is an older building and we had windows that could open.  I was able to bridge the modem audio that fed the analog circuit to WTC to this transmitter.  I then fashioned a simple dipole antenna out of a run of coax and suspended this vertically polarized dipole out the window with a broom stick.  I was not really pleased with the SWR reading off of this antenna but with some tweaking and I was able to get it to an acceptable level so the transmitter would not clip off and I was able to get about 80 percent of our transmitters in the NJ/NYC area back on the air keeping our customers and hospitals pagers in service.



It took some time of reflection of the timing of events of the day of the bombing.  I was glad that I procrastinated a little on that day due to my wife's illness and my desire to not head into lower Manhattan on a possible snow day.  If I would have left at my planned time there is a REAL good chance I would have been in the parking garage area when the bomb went off.