
We’ll take a simple spine-leaf topology where each leaf switch connects to four spine switches that provide the leaf-to-leaf interconnection. Once we’ve identified the edge switches with the target endpoints attached, the next task is to identify all the possible paths through the fabric that can be used by those endpoints to communicate and which devices sit in each of those paths. In this case, we’re looking for host 172.16.112.96 in VRF “tenant1”.


It’s tedious work, but necessary for ultimately identifying the leaf switches involved.įigure 1 illustrates the typical workflow for identifying where a given endpoint attaches to the network fabric. Repeat the process for the destination IP address. Next, login to that switch, check the local ARP table, identify the virtual LAN (VLAN) of the endpoint, and then check the media access control (MAC) table to find the physical interface. The approach typically involves logging into a random leaf switch and checking the local Address Resolution Protocol (ARP) table to see if the target IP happens to be directly attached in one of the virtual routing and forwarding (VRF) instances active on the switch.įailing that, we can check the routing table in the appropriate VRF, hopefully identifying which remote switch has the IP attached. In this era of multi-tenancy, virtual machine mobility, and dynamic workload placement, identifying the edge devices is not as straightforward as it might seem. How does an IT operator approach this problem? The “Old Way” involves a tedious, error-prone, multi-step workflow to validate that the network is behaving as intended.įirst, the leaf switches that have the problematic source and destination device attached must be identified.

That leads to the question: “How do you quickly and definitively identify the issue?”Īfter ruling out the obvious, and some not-so-obvious, signs of a network issue–things like drop counters massively incrementing on an interface, incorrectly applied QoS or security policies, or insidious microbursts–the IT operator is often left with nothing else to do than delve deep into the guts of the network fabric to ensure that all devices and paths between a source and a destination have proper network state, in both the control plane and the data plane. Any number of possible culprits across a variety of different devices could be wholly or partially contributing to an issue. Most IT operations folks are familiar with vaguely worded statements like “My application performance is bad,” “The network is slow,” and “Sometimes it works but sometimes it doesn’t.” Often, there’s very little concrete information to work with when attempting to diagnose network performance problems.
