Friday, September 13, 2019

Cisco FN70330 upgrade results

In April 2019 Cisco announced FN70330

Cisco field notice 70330
https://www.cisco.com/c/en/us/support/docs/field-notices/703/fn70330.html

This field notice involved an issue with the AP flash memory getting corrupted over time.  This affected many of the older Cisco AP platform prior to the x800 series To verify if your AP's were affected you needed to SSH or Telnet to each AP and run flash commands that are outlined in the notice.

Details from the notice of the various bugs:

Defect Information

Defect IDHeadline
CSCvk15043Wave 1 APs - AP radio FW image install failure in the bootup loop
CSCvk15068IOS APs, recovery logic for failure on primary Image
CSCvk26732New Flash recovery logic
CSCvm33617Configuration file should not be modified due to low flash memory
CSCvf16302Flash on lightweight IOS APs gets corrupted
CSCvf28459Write of the Private File nvram:/lwapp_ap.cfg Failed on compare RCA needed (try = 1)


Referenced in the workaround section of the notice notice there is a companion article:

Cisco Article - Understanding Various AP-IOS Flash Corruption Issues
https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/213317-understanding-various-ap-ios-flash-corru.html

In this 2nd article they present a "wlanpoller" script with installation instructions for MAC and PC.  This script automated the connections to all of your AP's and has the ability to recover certain AP's that is determines it can recover.  It also gives you a .csv report to help you see AP's that are currently having this issue.

Please note that this issue is a moving target.  The real solution here it to get off of the offending code/platform combination.

I had a wlan area of concern the was on an affected HA wlan controller pair that services critical inpatient areas of hospital.  I also had a non HA pair (N+1) in another hosptial.  The HA pair seemed to have a lot of flash cooruption issues found by the wlanpoller while the non HA pair hospital did not show any.  Not sure why that is but it was my results.  Both were running same code and same model mix of AP's.

In areas that I had this issue my goal was to cure the AP's or at least try to identify the AP's with issue so I can take action prior to the upgrade.  In certain cases if you attempt upgrade and the flash is corrupted there are chances that the AP becomes stranded and would require you to have direct access to the AP and/or replace the AP with a working unit while you recover the failed AP.  In  critical inpatient area access to rooms are difficult and outages can affect critical patient safety system so a conservative and careful approach is always best.

I have run the wlanpoller multiple time on multiple controllers and the list of affected AP's does change over time.  Just because an AP is on the list one pass does not mean it will be on the list the next pass.  This did not help me in my attempt to control the possible bad outcome of an AP becoming stranded.  As I said any AP's you may "fix" while running this bad version code others may then come forward getting affected by this but.  I really tried to clear the list of AP's showing "zero" flash by rebooting these units one at a time prior to the controller code upgrade.  Not sure if flash showing "zero" were the primary prospects leading to stranded AP.  I opened a case with Cisco TAC to try to better control and/or define the issue for a sure positive outcome but they were of no help.  Mainly because the bug was in the controller code and AP model combination and continues at all times.  So I needed to upgrade to true truly find out what my results would be.  Prior to starting this I made sure I had replacement inventory for any AP's that fails so we can get network staff with replacement equipment on hand to replace any failed AP's.

My results:

1 percent of AP's fail but fully recoverable by manually rebooting the power at the POE switch port.

1 percent of AP's fail, recoverable by manually rebooting the power at the POE switch port but after they returned to service they were at default configuration so needed to get reconfigured.

1 percent of AP's failed in the abandoned state that required staff to replace with at working unit.

I hope my experience with navigating this Cisco field notice helps you in making your decisions moving forward with your upgrade.  This required a lot of attention and in my case a plan to respond for the failed devices.

Please note this was an upgrade of a WISM2 HA pair servicing mostly x600 and x700 model Cisco AP's.





Wednesday, September 11, 2019

Experience as a Radio Engineer at One WTC

Every year on 9/11 I reflect back on my feelings on that day.  Watching the buildings burn and the impacts of those planes touch me personally.

I used to work and maintain radio equipment on One World Trade Center when I worked as an engineer for Southern New England Telephone's Paging system in NY,NJ at the time of the 1st bombing.


At One WTC we maintained a 72 MHz terrestrial radio link that fed data to all our paging transmitters in the NY,NJ area.  The data for this transmitter originated at our main paging terminal at 20 Exchange Place a few blocks away from the Trade Center buildings.

A week prior to the bombing our primary transmitter failed over to our secondary transmitter and I visited the site to investigate the issue to repair.  Yes real electronics as I used to do component level repair on radio circuits.  Trying to repair this equipment in the radio equipment area just below the roof of 1WTC is difficult and uncomfortable as it is very hot and lighting is very poor.  When you need tools or test gear it is about a 40 minute journey from the roof to the parking garage below 1WTC so after 1 or 2 attempts at making the repair I decided to pull the Primary unit and bring it home to my repair bench to better repair and burn in the unit after repair.

The next day I worked on the unit and replaced the components and aligned the drive for proper operation.

The following day I scheduled myself to do some system checks in the morning and after the peak NYC rush hour I would start my journey from my home in Northern NJ to One WTC hoping to have the unit installed and back in operation by just after lunch that day......Or that was the plan.....

While doing my system checks my wife told me she was not feeling well that day and there were signs of a few snowflakes coming down so I decided to push this replacement off another day rather then leave home and deal with snow in lower Manhattan.

I continued my system checks and had CNN on the TV in the background.  When the breaking news about smoke coming out of the bottom of the WTC interrupted the concentration on my work I was curious about what this issue was but at this point I was glad that I decided to delay my trip as this would have caused some issue getting near the building.

As details started to come out as to the extent of the damage and realization that this was the work of an explosion I continued to keep an eye on our system.  As a life long radio engineer I have always took my responsibility to keep systems operational very seriously and realized that thousands of people depend on my keeping things running especially in an emergency.

Later that night I received alerts that our WTC radio link lost AC power and was running on battery backup.  Time was very limited on the batteries and I was hoping this was a temporary power issue.  We were able to get contact with building personnel and were informed power and steam to the upper floors needed to be shut down due to the damage from the bomb damage.  When the batteries run down this would leave all of the greater NYC area out of service for our paging customers.

Realizing I had a good 72 Mhz link transmitter in my trunk I let my manager at SNET know that I had an option to keep the system running at some level.  I took the transmitter to our main paging terminal site at 20 Exchange Place.  20 Exchange Place is an older building and we had windows that could open.  I was able to bridge the modem audio that fed the analog circuit to WTC to this transmitter.  I then fashioned a simple dipole antenna out of a run of coax and suspended this vertically polarized dipole out the window with a broom stick.  I was not really pleased with the SWR reading off of this antenna but with some tweaking and I was able to get it to an acceptable level so the transmitter would not clip off and I was able to get about 80 percent of our transmitters in the NJ/NYC area back on the air keeping our customers and hospitals pagers in service.



It took some time of reflection of the timing of events of the day of the bombing.  I was glad that I procrastinated a little on that day due to my wife's illness and my desire to not head into lower Manhattan on a possible snow day.  If I would have left at my planned time there is a REAL good chance I would have been in the parking garage area when the bomb went off.

Wednesday, July 10, 2019

Client Authenticates - Yet no connectivity

Today I had an issue with a new set of client devices for some wireless EKG devices that was escalated to me to work.  These devices were on boarded and appeared in PRIME to be working okay.

1.  Devices authenticated
2. NAC state to RUN
3. Learned the IP address and mapped to a L2 interface.

Still could not ping from across the network or event from the directly connected router.  MAC address was showing on the correct VLAN.

When I ran a debug on the wireless controller and did a remove of the client so I can see the full set of messages at first it appears to me that all looked good....Till I looked closer to the detail towards the end.....


In the debug is was showing that the "Client learned IP from Orphan Packet"

This statement tells you that the controller is mapping this IP this clients MAC address for L2 to L3 mapping.  After this statement is displayed the gateway and netmask that does not agree with the subnet of the client address assigned.

For some reason this device is not being placed on the subnet for this client.

Since the controller is learning the IP from packet sent from the client (orphan packet) this tells me the client is not configured for DHCP.  In this case someone in the field in their troubleshooting decided to take matters into their own hands and configure the IP locally on the device.

The other question I had is why is the controller allowing this client behavior?

Looking at the WLAN configuration I found the other side of this issue.
DHCP required


 In order to enforce the use of DHCP addressing and not allow a client to overide your address assignment DHCP required needs to be enabled on the vlan so you can maintain control of your addressing.



Friday, March 15, 2019

WLANPros Phoenix 2019 ECSE experiences and notes


This year my employer supported my attendance to the fantastic WLANPros Conference plus my attendance to the Ekahau Certified Survey Engineer class to help me get familiar with the Ekahau software with the goal of converting over from the Airmagnet Survey and Planner toolset.

The instructor for the ECSE class was the amazing Ferney Munoz .  Mr Munoz is an extremely knowledgeable and engaging instructor who really knows the material thoroughly and it is apparent he has lived most of what he teaches.  If you ever have the opportunity to take one of his classes please go out of your way to attend as you will be a better network engineer after he is done with you.  This class was MUCH more then I expected.  I expected to get decent instruction on how to operate the use the Ekahau Site Survey and Planner software but what I got was a huge refresh on many of the issues we need to think about and consider as we design wireless networks.  Mr Munoz did a great job keeping the material fresh and interesting even though much was review for me but also gave me new ways to consider what I do everyday.

Notes for me to remember

The WLAN Design Steps:
Define - Devices - Use - Coverage - Quantity - Construction - Budget
Design - Remember the Least Capable Most Important Device
Deploy - How will you get it done?  Coordination etc..
Validate - Did you achieve the design goals - Post survey heat maps are your coverage documentation.

Measure the RF Loss on each typical wall




Measuring Wall Attenuation - Basic - dB loss is =x-y  
Do not use active connection with AP while measuring
Allow for about for 10 feet between RF source and wall.


Tuesday, February 20, 2018

Evolution into wireless - How not to do it today


In the beginning when I first started designing WiFi network wireless was always considered more of a "toy" or "convenience" item.  My 1st real deployment was for a hospital when they were converting over to electronic medical records in their inpatient areas.  I had the usual discussions we have today with key stake holders and asked them the crystal ball questions to determine not just the immediate needs but also try to design for at least the next few years.  In answer to this question I was told very specifically by hospital leadership that they can't imagine any more then 150 wireless clients ever on this wireless network and that is what I should design the network to support.  This was a 500 bed 7 story hospital.  Back then it was not uncommon to simply line up the AP's in the hallway...mainly because they did not allow AP's in the patient rooms.

So I determined typical AP locations in the hallway in reference to the patient rooms on either side of the hallway to get a few "rules of thumb".  Back then I had no tools or survey software to work with but I did have the "bars" on the windows network desktop and continuous pings.

To deploy I marked up the floor plans of the hospital and had the cable and network staff start deploying the AP's as I directed.  Making way to many assumptions and being WAY too conservative on the number of AP's and WAY to generous on the output power of the AP's....did I mention this was 802.11b radios?.

Please note....when YOU are the only wireless client on a wireless network and you do coverage testing the wireless coverage is amazingly great!  This was the first time I learned that coverage is NOT everything. 

Also note I BEGGED to have some kind of soft roll out of the wireless carts for the nurses...maybe a phased approach?.....maybe just get the devices deployed to the floors before go live.....No cooperation on any of this!

Go live day(weekend) comes and the PC group rolls out all these large battery powered carts to the staff to now start using this software that they were trained on a few months ago.

Overall this 1st roll out was not horrible...considering how much was left up to chance and at the time I had no idea what I was doing....At least to that scale.


As time rolled forward they started to add more and more devices.....without communicating to network that this was happening.  Complaints roll in.....I got better and learning and recognizing the issues.....reading a LOT.....Learned about CCI (something I already knew about in my previous career in radio systems).  High utilization....Using smaller cells at lower power....beacon rates etc.....


What I love about WiFI has been the never ending learning involved.  This always has kept it interesting to me.

Thursday, August 31, 2017

A Note about Band Select, AP Groups, and RF Profiles on Cisco WLC

This note is not a full explanation of these items but only a point of clarification on an item I was confused over.

In my network as with many larger wireless network I have found it useful to deploy AP groups along with RF profiles to help customize the service in different areas and different use cases for wireless.


When you apply an RF profile I was a little confused about the Band Select section of the Client Distribution tab.  At first I was concerned that if you did not select "Probe Response" is would disable Band Select on the AP's where you have this applied.  But....I was wrong about this.  This only allows you to override these specific Band select settings on these particular AP's allow you to customize Band Select but not disable it. 

This was just something I was lead astray by and wanted to communicate it out there in the event it confused anyone else.

JC




Thursday, January 12, 2017

Children's Hospital - Annual Cyber Santa Visit



Every year I help organize a cyber visit with Santa for the patents at a Children's Hospital. This is an event that is always a fun distraction for patients and families going thru a difficult time.


This event originally started as a nationwide program supported by Cisco Systems but at some point they ended national support and our local Cisco account team personally stepped up and continued the program at our hospital along with out local IT and Child Life staff.


As we approach the Christmas holiday Santa Clause is as expected a very busy fellow. Using collaboration and wireless technologies he is able to have personal one on one visit with children in the hospital by using portable wireless devices (tablets), video collaboration services (Cisco Spark, Skype, etc) and a wireless network. When we have ambulatory patients we try to use a conference room with a large screen TV to enable the children to visit with Santa along with their families and siblings. To visit with non ambulatory and patients that can't leave their rooms for various reasons we use a wireless tablet device. The tablet makes it easy to bring the visit right to the bed for the patient and can also be easily sanitized or bagged for protection of the patient. As we progress room to room with the tablet Santa also gets to visit with the nursing staff. They always get a kick out of Santa and always make sure to put in the gift requests!


Don't ignore proper security practice and patient confidentiality!
It is extremely important to protect our patients. Meetings and review by cyber and patient privacy help develop all the proper procedures and sign-off for our patient's participation in this event. These meetings happen months in advance to ensure a successful event. We also review media releases with patient's parents so we do not violate any patient's parental wishes.


Below is a sample of some of the coverage by the media.
Penn State Health youtube channel
https://www.youtube.com/watch?v=utAT3wifivw

Penn State News
http://news.psu.edu/story/383128/2015/12/02/santa-makes-cyber-stop-penn-state-hershey-children%E2%80%99s-hospital

Event made national coverage on ABC in 2014
http://abcnews.go.com/US/cyber-santa-claus-connects-patients-pennsylvania-childrens-hospital/story?id=27359975

Local abc27 Coverage
http://abc27.com/2014/12/03/cyber-santa-connects-with-children-at-penn-state-hershey-childrens-hospital/




 I hope this article helps you think how you can use your skills as an IT professional to the benefit the community you service.