Thursday, September 20, 2007

WLAN IDS and the bizarre world of security exploits

If you make security software (or any software, for that matter) sooner or later you will create what I technically refer to as a booboo. A security vulnerability in your software that raises the ire of your customers and make you feel foolish and sad. Not to worry, mateys, this happens to all software manufacturers. The important thing to remember here is how you handle it. Are you going to be a Pro or a shmuck? Recently, AirDefense (why no dot com?), a WLAN IDS manufacturer had just such and incident. Is this uncommon? Relatively so. Is it dire? Not really. Are you just sniping at your competitor? Kind of, but in the interest of disclosure, we had an incident a long time ago as well so, dear friends, I feel their pain.

Let's talk about what happened first. The vulnerability as explained here happens when you send a specially crafted HTTPS request, which will cause the HTTPS service on the system to crash. It appears from my quick glance as if you need to authenticate first and also be on the segment from which you can administer the system. So what is this? Granted it can bring down the sensor but actually it appears to be a "tempest in a teacup". You need to be the admin or snarf the admin login in order to cause a denial of service to one of probably many tens or hundereds of sensors. Unlikely at best.

So how was this handled? Professionally, in my humble opinion. AirDefense contacted the people who reported the exploit and directed them to a patch for it as reported here, "Solution: Update to the latest firmware version"

AirMagnet had a similar experience Last October. And we handled it the same way. Here is our official response to the problem from back then:

Re: Airmagnet management interfaces multiple vulnerabilities
AirMagnet vendor response below -

(1) The vulnerabilities are tested against an over-a-year old AirMagnet Enterprise product,
(2) Some of these vulnerabilities have been patched and fixed in AirMagnet Enterprise version 7.0.x,
(3) All vulnerabilities are now completely fixed by AirMagnet Enterprise version 7.5 build 6307 and later.
(4) AirMagnet customers can download patches from MyAirMagnet support web site (http://www.airmagnet.com/my_airmagnet/index.php)

So to summarize, there are a lot of security professionals out there who are trying to make a name for themselves and do it in an industry, like the WLAN industry, that is going places. They spend all their time looking for these exploits and I, for one, am glad they do. They keep us honest and ensure that we are doing our very best to protect our customers. Are their motives pure? Debatable but mostly. Do they sit down afterwards and talk amongst themselves about what l@m3rz those software guys are? You bet! Should I take it personally? Nah.



Labels: , , , , , ,

Monday, July 30, 2007

The Myth of the Self-Monitoring WLAN

Recently, as you all probably know by now, Duke University had a WLAN meltdown. The CIO, Tracy Futhey (Comment here) and the assistant IT director, Kevin Miller (Comment here) have put to rest the notion that the Apple iPhone caused it. Cisco has issued an advisory to the effect and Apple assisted in the effort.

I am not going to go into the details of what happened or why. Suffice it to say that mobile handhelds of all types, not just iPhones, send a lot of ARP traffic and the Cisco infrastructure was not ready for it. The quote at Network World explains that, "The advisory finally makes it clear that the iPhone simply triggered the ARP storms that were made possible by the controller vulnerabilities. Any other wireless client device, moving from one subnet to another apparently could have done the same thing."

What I will point out, however, is the problem we in the Wi-Fi community have today with the following simple delusion, "Your WLAN infrastructure as a cohesive, integrated, single-vendor solution is all anybody needs. It is self monitoring and self healing." I talk to a lot of people about which WLAN solution they are going to purchase and implement and I am always surprised by how many believe that the AP and controller vendor has all the answers. Don't get me wrong, I am a huge fan of this type of solution. Central management is critical for even medium sized organizations of 50 or more APs, much less larger ones that may a few hundred or even thousands. Manually changing the configuration of each AP is not a viable solution in these cases. The Admin needs assistance. And the story sounds so great, "Implement our solution and it will fix itself when it breaks and protect itself when security policies are breached." Who wouldn't want that?

But the truth is a little more complicated. As we have seen from previous posts, sometimes the solution doesn't behave the way your business practices need. Similarly, sometimes there are security problems within the infrastructure itself. So what to do?

This will sound like an advertisement for the company I work for and I apologize ahead of time but there is a very good reason I continue to work there. Mainly, I believe in the message.

When the Duke network went down and the Assistant IT director looked at his WLAN infrastructure dashboard, what did he see? I have not spoken with him directly but my guess would be it said, "hey man, it ain't me. Everything looks good from my end" So what did he do? he pulled out a sniffer and got to work. With packet traces in hand and assistance from Cisco and Apple he solved the problem. Did the infrastructure fix itself? Did it correctly identify the problem and solution? No. A patch is now needed to keep this from happening again.

One should not blame the infrastructure for not getting this right at the outset nor should one blame Mr. Miller. He was correctly reading what the controllers were telling him. But it shows how important it is to have a separate, 3rd party solution also available to get down to the bits and bytes or even spectrum analysis (if the problem should be something other than 802.11 protocol madness.)

There are a few great WLAN security vendors out there and they make 3rd party, best of breed solutions for monitoring the security of your WLAN (one of which recently got snatched up pennies on the dollar and will probably be rolled into another integrated, self-healing, self-monitoring role; against my better judgment.) There are an even smaller number who both monitor your security and your connectivity and performance and give you great troubleshooting tools built-in (insert shameless plug here). These should be your trusted advisor's when things go wrong. I am in no way suggesting that they would have identified the problem and cause and given a solution at Duke either (although I think they at least would have shown alerts for denial of service and strange traffic behavior.) What I am suggesting is that with them in place you now have a set of tools to assist in solving the problem. Remote packet and/or spectrum analysis. Alarm thresholds that can be set by the admin and will continue surveillance. Reports. System-to-system notifications. Graphs of speed and traffic type. Lists of who is connected to what and how. All the things you would need to get to the bottom of any problem in that invisible Luminiferous Ether.




Labels: , , , , , , , ,

Friday, July 27, 2007

Cisco Ripples - DCA and RRM - Help is on the way

Since I first published " The Ripple Effect" back in February I have heard from many folks who have validated the effect but to my chagrin, I have had no solution to offer. Well thankfully there are smarter people than me out there and solutions have started to appear.

I was alerted to the fact that Medical Connectivity consulting recently put Cisco in their sights and quoted my blog with regard to Dynamic Channel Assignment and RRM causing issues. The Web, being the great time waster that it is, lead me on a journey. As I read the article I clicked here and there and next thing I knew I was looking at a forum at Cisco that was talking about this exact phenomena.

One of the forum posters had some great suggestions to eliminate this problem in the future. Bruce Johnson at Partners Healthcare offered this solution,

"We saw the majority of DCA events were triggered by Interference from Rogue APs. After we disabled Foreign AP Avoidance the number of channel changes dropped by an entire order of magnitude (1000s to 100s). We disabled Cisco AP Load Avoidance and this reduced the number of DCAs within an order of magnitude (100s less).

DTPC will power-up APs to max levels to provide a 3-neighbor -65 RSSI coverage "grid" and 7921s will power up to follow suit (up to their max Tx Power). Other clients with higher Tx power may send the APs to max power causing a mismatch with IP phones.

You can decrease the tx-power-threshold so the "grid" won't be as hot (default is -65, change to -71 or -74):

config advanced 802.11a tx-power-control-thresh <-50 to -80>
config advanced 802.11b tx-power-control-thresh <-50 to -80>

and reduce the coverage hole detection threshold (reduce Min SNR level in RRM Thresholds) to suppress the power-up activity."
Bruce seemed on track with this fix. the problem is that it isn't a fix. It shuts off the RRM and DCA so that the WLAN would remain stable. So where is the benefit of a controller based system?

He does note that a fix is forthcoming from Cisco, "They are revamping the behavior of RRM in the WLC 4.1 Maintenance release." Which is later confirmed by a Cisco employee, Saurabh Bhasin a TME,

"With the 4.1 Maintenance Release(MR) due out on cisco.com shorly, many improvements based on such feedback have been brought into RRM's algorithms ? improvements aimed at allowing administrators to fine-tune their RRM-run WLANs where desired. These enhancements will allow for greater control over both the channel and power output selection algorithms, so administrators may assist RRM in being either more or less aggressive in such decisions, depending on application and network needs. Additionally, enhancements have been made to the management and reporting of all RRM information and configuration alterations to allow for better tracking of RF environmental fluctuations and to assist in keeping track of RRM activity. Further technical detail on the inner workings of these enhancements will be available very soon in an update to the above-mentioned RRM Whitepaper."
The paper he references is found here http://www.cisco.com/warp/public/114/rrm.html and explains a lot of what we are all seeing. (here is the PDF version)

So here is to hope that WLC 4.1 Maint. Rels. fixes it. As an aside, Bruce Johnson is skeptical,
"Its all well and good to make things work for Intel and the CCX/CCKM compliant crew, but if you have any of the other brands of WLAN NICs (like those made by medical device manufacturers, who won't subscribe to fast roaming features until they're adopted by the IEEE) you are best keeping RRM disabled until it delivers on its promise as stated in the following 802.11TGv Objectives draft:

Service and Function Objectives

Solutions shall define mechanisms to provide the service listed below.

[Req2000] TGv shall support Dynamic Channel Selection, to allow STAs to avoid interference. Solution shall be able to change the operating channel (and/or band) for the entire BSS during live system operation and be done seamlessly with no intermittent loss of connectivity from the perspective of an associated STA. Solution shall not define algorithm for channel selection."


Labels: , , , , ,

Friday, March 9, 2007

Building a Voice Capable WiFi Network

Building a wireless network that supports data traffic is hard enough but trying to support VOIP over your WLAN (also known as VoFi) can be a nightmare. To make matters worse, when you ask your vendor how to make Voice work on your WLAN they explain you will need 2X-3X as many APs as you needed for data. "Sure I do", you respond. Confident that the sales person from your vendor just wants to sell you more APs. Or, better yet, you turn to your trusted VAR and he suggests another site survey. "Right, another one", you say, with that knowing look in your eye and a sinking feeling that you are being strung along. You feel like the guy who brings his car in for a tune-up and gets told he needs a complete overhaul.

Well, I have nothing to sell you and no agenda that I will benefit from by saying this but your infrastructure vendor and your VAR are absolutely correct. You probably will need more APs and you sure as heck will need another survey. Lets find out why, shall we?

Unlike Email and web access, slight lags or delays in traffic or small losses in connectivity will completely destroy calls. A person who has access to the Internet durring a meeting in a conference room is far less likely to lose his cool for small delays than when he is on the phone with an important client.

You see, wireless handsets are much lower powered compared to the access points they talk through. A typical AP is usually set to communicate at 100milliwatts (mw) whereas the typical handset is roughly 5mw. This makes it very easy for the handset to hear the AP but very hard for the AP to hear the handset when it is far away. Also they are far less resilient to fragmented packets, retries, packet loss etc.

So what can I do? Well the simplest thing to do would be to ensure that the handset is always at the same power as the AP. That means either increasing the power on the handset or, more likely, lowering the power on the AP. This will mean, of course, that you will need more APs to cover the same area.

For example here are 4 APs at 100milliwatts:


Here are the same APs but now set to 5mw instead, notice the gaps in coverage:


In order to compensate, we must add many more APs to fill in the holes, all configured to run at 5 mw:


As you can see, much better. Now, though, our main issue is channels. APs that overlap thier signal on the same channel take away from the usable bandwidth. We want to ensure we do not trample the signal from another AP so we must adjust the channel plan.
Also, remember we only have 3 channels to work from.

Cisco, at this point recommends the following:


That explains why I limited the seen signal to -67dbM making all the other signal fall off and appear grey.

In a week or two, we will discuss debugging Voice issues and setting MOS scores.

Labels: , , , , ,