“The Network” is Always “Guilty”

By William F. Carr ·

There is no presumption of innocence for Network Managers…

Are you an IT professional responsible for your user’s network?   You are not protected by the legal system’s defense “innocent until proven guilty” in the eyes of your user’s.  This is not a new phenomenon; it has been experienced for years, from the days of sub-rate DDS circuits and asynchronous terminals, to current high speed internet connections and Wireless LANs.

When an outage or degradation occurs, it becomes very easy to put the word “network” in front of it.

Perhaps that is because of the nebulous nature of today’s compute and application architectures, and the invisible nature of the prominent Wireless Access Layer networks.  When a user click’s an app icon, it could be local, within a regional data center, in the cloud or multi-tiered.  Poor response or no response is always “a network problem”, even if other applications or network access works flawlessly.

Visibility becomes the first line of defense (network managers spend an inordinate amount of time defending their network’s performance) to “disprove” the network’s culpability in a particular event.

Visibility is not always Network Management, but many Network Management products can enhance or facility better network visibility.

Visibility is the ability to detect who, what, where, when, how and in most cases the most important item, “how long”.  The ability to validate the capacity for traffic to reach from client to application resources is primary, followed immediately with a measurement of round-trip response time from client to application resources.  Armed with this information, the Network Manager can normally make some defenses if response times aren’t reasonable.

This is not always just a ploy for the Network Manager to point a finger at another department; rather it is a mechanism to ensure that troubleshooting efforts are as productive as possible, by steering resources to investigate other parts of the application footprint, rather than wasting unnecessary effort on “network” troubleshooting.

The goal is to reduce the MTTR (Mean Time To Repair) for any event, and Network Managers are uniquely placed in this “guilty until proven innocent, but still guilty because we had to ask you…” role that other departments are not held to.

So what is a Network Manager to do?

  • Build your toolset
    • Visibility tools for the Access Layer and above
    • Network reliability and connectivity measurement tools
    • End-to-end diagnostic tools
  • Educate your personnel on how to use those tools
    • The faster they can identify issues, the more effective they will be
  • Understand the application components for your critical business applications
    • Connectivity components (routers, firewalls, load balancers, UTM devices, WAN, VPN, etc.)
    • Compute Resources (servers, storage, databases, multi-tier)

Once the Network Manager has their basic access-layer toolset, we discuss Application Performance Monitoring (APM) tools to gather application level data, but these may not always be in the Network Manager’s cost center or area of responsibility.  Stay tuned to the Comm Solutions Blog site for more details on APM tools.

Let’s start with some examples of the visibility toolkit’s we assemble for many of our clients.

Inventory and Location of Endpoints

Frequently when troubleshooting begins, we start with details on the user endpoint and the application endpoint.  This may be the IP addresses of the devices, or if dynamic, we may need Layer-2 MAC information.

Having inventory of all assets, including device fingerprinting, updated in real-time without human intervention is a first line of defense in reducing MTTR.  If we can quickly look up a MAC or IP (via an Inventory Repository or DHCP log correlation) and determine ownership, last seen location and device type, we can begin troubleshooting with a much more informed stance.

Many customers confuse “inventory” and fingerprinting with NAC, (Network Access Control) and are fearful due to the complexity or administrative burden sometimes associated with that term.

We have unique solutions which can pull information from your DHCP server, snoop DHCP exchanges/HTTP headers and allow ALL devices on your network without the heavy lift.  This can be a precursor to a NAC deployment, or simply allowed to continually collect and update your access layer inventory automatically.

RF/WLAN and Access Layer Visibility

Access Layer Visibility can include data on switch port statistics, uplink utilization, error rates and end-to-end link monitoring.

aruba_01

When we introduce Wireless and RF into an environment, it may create a huge hole in statistical and error rate troubleshooting capabilities for Network Managers.

We leverage tools such as Aruba Network Airwave Management platform to provide cohesive wired and wireless Access Layer visibility.  We can gather wired port and uplink data from a multi-vendor network, as well as expand WLAN visibility in a multi-vendor wireless environment.

Aruba_02

The ability to visualize users/devices upon a live floorplan/heat map is an amazingly valuable tool to identify areas of poor coverage, interference or a misconfigured or defective client device.

Network Reliability / Diagnostic Tools

Finding and isolating WLAN coverage holes as well as performing site surveys are critical as today’s access layer becomes primarily Wireless.  Tools such as the AirMagnet Survey Pro and Spectrum XT can be utilized to provide a vendor-neutral tool to map coverage and identify interference sources.  It is a critical tool in our toolkit, used to provide post-installation validations and on-demand troubleshooting for Wireless LANs, regardless of the vendor.

For end-to-end testing and diagnostics, a number of free and commercial tools exist to provide data upon round-trip timing and packet loss.  The simple IPerf/JPerf tools described in these locations are the most commonly used tools to gather quick end-to-end performance statistics:

As a Network Manager, the network is always blamed.  Sometimes, rightfully so.  If it is the network, you want to quickly resolve it, and if it is not, you want to confidently defend your position with valuable data to help other departments resolve the issues and restore services as quickly as possible.  Having insight into your network’s dynamic resources, their locations and diagnostics are ways to shorten the troubleshooting lifecycle.