Friday, September 13, 2019

Cisco FN70330 upgrade results

In April 2019 Cisco announced FN70330

Cisco field notice 70330
https://www.cisco.com/c/en/us/support/docs/field-notices/703/fn70330.html

This field notice involved an issue with the AP flash memory getting corrupted over time.  This affected many of the older Cisco AP platform prior to the x800 series To verify if your AP's were affected you needed to SSH or Telnet to each AP and run flash commands that are outlined in the notice.

Details from the notice of the various bugs:

Defect Information

Defect IDHeadline
CSCvk15043Wave 1 APs - AP radio FW image install failure in the bootup loop
CSCvk15068IOS APs, recovery logic for failure on primary Image
CSCvk26732New Flash recovery logic
CSCvm33617Configuration file should not be modified due to low flash memory
CSCvf16302Flash on lightweight IOS APs gets corrupted
CSCvf28459Write of the Private File nvram:/lwapp_ap.cfg Failed on compare RCA needed (try = 1)


Referenced in the workaround section of the notice notice there is a companion article:

Cisco Article - Understanding Various AP-IOS Flash Corruption Issues
https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/213317-understanding-various-ap-ios-flash-corru.html

In this 2nd article they present a "wlanpoller" script with installation instructions for MAC and PC.  This script automated the connections to all of your AP's and has the ability to recover certain AP's that is determines it can recover.  It also gives you a .csv report to help you see AP's that are currently having this issue.

Please note that this issue is a moving target.  The real solution here it to get off of the offending code/platform combination.

I had a wlan area of concern the was on an affected HA wlan controller pair that services critical inpatient areas of hospital.  I also had a non HA pair (N+1) in another hosptial.  The HA pair seemed to have a lot of flash cooruption issues found by the wlanpoller while the non HA pair hospital did not show any.  Not sure why that is but it was my results.  Both were running same code and same model mix of AP's.

In areas that I had this issue my goal was to cure the AP's or at least try to identify the AP's with issue so I can take action prior to the upgrade.  In certain cases if you attempt upgrade and the flash is corrupted there are chances that the AP becomes stranded and would require you to have direct access to the AP and/or replace the AP with a working unit while you recover the failed AP.  In  critical inpatient area access to rooms are difficult and outages can affect critical patient safety system so a conservative and careful approach is always best.

I have run the wlanpoller multiple time on multiple controllers and the list of affected AP's does change over time.  Just because an AP is on the list one pass does not mean it will be on the list the next pass.  This did not help me in my attempt to control the possible bad outcome of an AP becoming stranded.  As I said any AP's you may "fix" while running this bad version code others may then come forward getting affected by this but.  I really tried to clear the list of AP's showing "zero" flash by rebooting these units one at a time prior to the controller code upgrade.  Not sure if flash showing "zero" were the primary prospects leading to stranded AP.  I opened a case with Cisco TAC to try to better control and/or define the issue for a sure positive outcome but they were of no help.  Mainly because the bug was in the controller code and AP model combination and continues at all times.  So I needed to upgrade to true truly find out what my results would be.  Prior to starting this I made sure I had replacement inventory for any AP's that fails so we can get network staff with replacement equipment on hand to replace any failed AP's.

My results:

1 percent of AP's fail but fully recoverable by manually rebooting the power at the POE switch port.

1 percent of AP's fail, recoverable by manually rebooting the power at the POE switch port but after they returned to service they were at default configuration so needed to get reconfigured.

1 percent of AP's failed in the abandoned state that required staff to replace with at working unit.

I hope my experience with navigating this Cisco field notice helps you in making your decisions moving forward with your upgrade.  This required a lot of attention and in my case a plan to respond for the failed devices.

Please note this was an upgrade of a WISM2 HA pair servicing mostly x600 and x700 model Cisco AP's.





Wednesday, September 11, 2019

Experience as a Radio Engineer at One WTC

Every year on 9/11 I reflect back on my feelings on that day.  Watching the buildings burn and the impacts of those planes touch me personally.

I used to work and maintain radio equipment on One World Trade Center when I worked as an engineer for Southern New England Telephone's Paging system in NY,NJ at the time of the 1st bombing on February 26, 1993.


At One WTC we maintained a 72 MHz terrestrial radio link that fed data to all our paging transmitters in the NY,NJ area.  The data for this transmitter originated at our main paging terminal at 20 Exchange Place a few blocks away from the Trade Center buildings.

A week prior to the bombing our primary transmitter failed over to our secondary transmitter and I visited the site to investigate the issue to repair.  Yes real electronics as I used to do component level repair on radio circuits.  Trying to repair this equipment in the radio equipment area just below the roof of 1WTC is difficult and uncomfortable as it is very hot and lighting is very poor.  When you need tools or test gear it is about a 40 minute journey from the roof to the parking garage below 1WTC so after 1 or 2 attempts at making the repair I decided to pull the Primary unit and bring it home to my repair bench to better repair and burn in the unit after repair.

The next day I worked on the unit and replaced the components and aligned the drive for proper operation.

The following day I scheduled myself to do some system checks in the morning and after the peak NYC rush hour I would start my journey from my home in Northern NJ to One WTC hoping to have the unit installed and back in operation by just after lunch that day......Or that was the plan.....

While doing my system checks my wife told me she was not feeling well that day and there were signs of a few snowflakes coming down so I decided to push this replacement off another day rather then leave home and deal with snow in lower Manhattan.

I continued my system checks and had CNN on the TV in the background.  When the breaking news about smoke coming out of the bottom of the WTC interrupted the concentration on my work I was curious about what this issue was but at this point I was glad that I decided to delay my trip as this would have caused some issue getting near the building.

As details started to come out as to the extent of the damage and realization that this was the work of an explosion I continued to keep an eye on our system.  As a life long radio engineer I have always took my responsibility to keep systems operational very seriously and realized that thousands of people depend on my keeping things running especially in an emergency.

Later that night I received alerts that our WTC radio link lost AC power and was running on battery backup.  Time was very limited on the batteries and I was hoping this was a temporary power issue.  We were able to get contact with building personnel and were informed power and steam to the upper floors needed to be shut down due to the damage from the bomb damage.  When the batteries run down this would leave all of the greater NYC area out of service for our paging customers.

Realizing I had a good 72 Mhz link transmitter in my trunk I let my manager at SNET know that I had an option to keep the system running at some level.  I took the transmitter to our main paging terminal site at 20 Exchange Place.  20 Exchange Place is an older building and we had windows that could open.  I was able to bridge the modem audio that fed the analog circuit to WTC to this transmitter.  I then fashioned a simple dipole antenna out of a run of coax and suspended this vertically polarized dipole out the window with a broom stick.  I was not really pleased with the SWR reading off of this antenna but with some tweaking and I was able to get it to an acceptable level so the transmitter would not clip off and I was able to get about 80 percent of our transmitters in the NJ/NYC area back on the air keeping our customers and hospitals pagers in service.



It took some time of reflection of the timing of events of the day of the bombing.  I was glad that I procrastinated a little on that day due to my wife's illness and my desire to not head into lower Manhattan on a possible snow day.  If I would have left at my planned time there is a REAL good chance I would have been in the parking garage area when the bomb went off.

Wednesday, July 10, 2019

Client Authenticates - Yet no connectivity

Today I had an issue with a new set of client devices for some wireless EKG devices that was escalated to me to work.  These devices were on boarded and appeared in PRIME to be working okay.

1.  Devices authenticated
2. NAC state to RUN
3. Learned the IP address and mapped to a L2 interface.

Still could not ping from across the network or event from the directly connected router.  MAC address was showing on the correct VLAN.

When I ran a debug on the wireless controller and did a remove of the client so I can see the full set of messages at first it appears to me that all looked good....Till I looked closer to the detail towards the end.....


In the debug is was showing that the "Client learned IP from Orphan Packet"

This statement tells you that the controller is mapping this IP this clients MAC address for L2 to L3 mapping.  After this statement is displayed the gateway and netmask that does not agree with the subnet of the client address assigned.

For some reason this device is not being placed on the subnet for this client.

Since the controller is learning the IP from packet sent from the client (orphan packet) this tells me the client is not configured for DHCP.  In this case someone in the field in their troubleshooting decided to take matters into their own hands and configure the IP locally on the device.

The other question I had is why is the controller allowing this client behavior?

Looking at the WLAN configuration I found the other side of this issue.
DHCP required


 In order to enforce the use of DHCP addressing and not allow a client to overide your address assignment DHCP required needs to be enabled on the vlan so you can maintain control of your addressing.



Friday, March 15, 2019

WLANPros Phoenix 2019 ECSE experiences and notes


This year my employer supported my attendance to the fantastic WLANPros Conference plus my attendance to the Ekahau Certified Survey Engineer class to help me get familiar with the Ekahau software with the goal of converting over from the Airmagnet Survey and Planner toolset.

The instructor for the ECSE class was the amazing Ferney Munoz .  Mr Munoz is an extremely knowledgeable and engaging instructor who really knows the material thoroughly and it is apparent he has lived most of what he teaches.  If you ever have the opportunity to take one of his classes please go out of your way to attend as you will be a better network engineer after he is done with you.  This class was MUCH more then I expected.  I expected to get decent instruction on how to operate the use the Ekahau Site Survey and Planner software but what I got was a huge refresh on many of the issues we need to think about and consider as we design wireless networks.  Mr Munoz did a great job keeping the material fresh and interesting even though much was review for me but also gave me new ways to consider what I do everyday.

Notes for me to remember

The WLAN Design Steps:
Define - Devices - Use - Coverage - Quantity - Construction - Budget
Design - Remember the Least Capable Most Important Device
Deploy - How will you get it done?  Coordination etc..
Validate - Did you achieve the design goals - Post survey heat maps are your coverage documentation.

Measure the RF Loss on each typical wall




Measuring Wall Attenuation - Basic - dB loss is =x-y  
Do not use active connection with AP while measuring
Allow for about for 10 feet between RF source and wall.