I recently performed some network troubleshooting on a newly-racked vCloud management cluster. The cluster was comprised of two ESXi 5.5 hosts. Management connectivity was OK for one but not the other, even though ESXi and switch configuration appeared to be correct and consistent between both hosts.

The lack of connectivity to one of the hosts was due to an easy-to-make mistake in identifying port numbers when cabling. The configured ports were “off by one”, but this wasn’t immediately obvious as media had been connected to adjacent ports.

The diagnosis was aided by LLDP, which is notable as vCenter was not yet available and a vNetworking Distributed Switch could not therefore be used.

Discovery protocols

There are a few protocols dealing with the discovery of details about connected network peers. Most are vendor-specific protocols, with Link Layer Discovery Protocol LLDP a formally adopted standard (IEEE 802.1ab) implemented in a number of vendors’ networking switches (as well as other devices and open source software).

Current releases of VMware vSphere support Cisco Discovery Protocol (CDP) and LLDP. Support for LLDP was added in vSphere 5, but only for distributed virtual switches:

vSphere 5.0 supports Cisco Discovery Protocol (CDP) and Link Layer Discovery Protocol (LLDP). CDP is available for vSphere standard switches and vSphere distributed switches connected to Cisco physical switches. LLDP is available for vSphere distributed switches version 5.0.0 and later. vSphere Networking

Working with LLDP in ESXi 5.5 shell

Inspecting advertisements

The pktcap-uw ESXi Shell utility newly available in vSphere 5.5 is capable, among other things, of capturing ingress traffic on a vSwitch uplink port (the tcpdump-uw utility captures traffic at a vmkernel interface).

If an ESXi host NIC port is connected to a device that advertises information by LLDP it is possible to inspect that information from ESXi Shell:

~ # pktcap-uw --uplink vmnic1 --ethtype 0x88cc -c 1 -o /tmp/lldp.pcap > /dev/null && hexdump -C /tmp/lldp.pcap
00000000  d4 c3 b2 a1 02 00 04 00  00 00 00 00 00 00 00 00  |................|
00000010  ff ff 00 00 01 00 00 00  33 92 c5 52 6e 3d 0d 00  |........3..Rn=..|
00000020  63 00 00 00 63 00 00 00  01 80 c2 00 00 0e f8 66  |c...c..........f|
00000030  f2 cf 45 7e 88 cc 02 07  04 f8 66 f2 cf 45 71 04  |..E~......f..Eq.|
00000040  05 05 67 69 31 33 06 02  00 78 0a 0c 73 77 69 74  |..gi13...x..swit|
00000050  63 68 63 66 34 35 37 31  0c 27 53 47 33 30 30 2d  |chcf4571.'SG300-|
00000060  32 38 20 32 38 2d 50 6f  72 74 20 47 69 67 61 62  |28 28-Port Gigab|
00000070  69 74 20 4d 61 6e 61 67  65 64 20 53 77 69 74 63  |it Managed Switc|
00000080  68 fe 06 00 80 c2 01 00  01 00 00                 |h..........|
0000008b
~ #

The basic form of the command is pktcap-uw --uplink <vmnic> --ethtype 0x88cc -c 1 -o <tmpfile> && hexdump -C <tmpfile>

The capture filter being employed here is for Ethernet Type 0x88cc. This type is used specifically with LLDP frames.

This provides a fairly raw view of data in the LLDP Protocol Data Unit. This example, which comes from my lab Cisco SG-300 series switch, contains:

  • A switch MAC address as the Chassis ID: f8 66 f2 cf 45 71
  • An interface name as the Port ID: gi13

both of which are mandatory for LLDP.

It also contains the switch hostname (System Name) switchcf4571 and model number (System Description) SG300-28 28-Port Gigabit Managed Switch.

It’s unsophisticated, but easy enough to enter such a command via an out-of-band management session.

Enabling LLDP on a standard vSwitch

Important: As far as I’m aware, this is completely undocumented. I don’t expect that VMware would provide support for this functionality and I wouldn’t recommend relying on its correct operation

It is also possible to enable LLDP on a standard virtual switch uplink using vsish. There’s a handful of LLDP vmkernel data structures exposed for virtual network ports.

To enable LLDP on a standard vSwitch one can set lldp/enabled on each uplink port. There are usually a number of ports associated with a vSwitch (you may have learned that while you can configure the number of ports on a vSwitch, a couple are implicitly consumed internally). In my limited experience, the first uplink port has been the second port in numerical ID order – at least in a conventional management network configuration.

In the following example, one uplink and one vmkernel interface have been configured:

~ # vsish
/> cd /net/portsets/vSwitch0
/net/portsets/vSwitch0/> ls uplinks
vmnic1/
/net/portsets/vSwitch0/> ls ports
33554433/
33554434/
33554435/
33554436/
/net/portsets/vSwitch0/>

The port associated with the uplink is 33554434. There are a few give-aways in the information provided by the status structure:

/net/portsets/vSwitch0/> get ports/33554434/status
port {
   port index:2
   portCfg:
   dvPortId:
   clientName:vmnic1
   clientType:port types: 4 -> Physical NIC
   clientSubType:port types: 0 -> NONE
   world leader:0
   flags:port flags: 0x10c043 -> IN_USE ENABLED UPLINK DISPATCH_STATS_IN DISPATCH_STATS_OUT DISPATCH_STATS CONNECTED
   Passthru status:: 0x1 -> WRONG_VNIC
   fixed Hw Id:ec:a8:6b:f1:df:16:
   ethFRP:frame routing {
      requested:filter {
         flags:0x00000000
         unicastAddr:00:00:00:00:00:00:
         numMulticastAddresses:0
         multicastAddresses:
         LADRF:[0]: 0x0
         [1]: 0x0
      }
      accepted:filter {
         flags:0x00000000
         unicastAddr:00:00:00:00:00:00:
         numMulticastAddresses:0
         multicastAddresses:
         LADRF:[0]: 0x0
         [1]: 0x0
      }
   }
   filter supported features:features: 0 -> NONE
   filter properties:properties: 0 -> NONE
   rx mode:properties: 0 -> INLINE
}
/net/portsets/vSwitch0/>

Particularly clientName vmnic1, clientType 4 -> Physical NIC and flags UPLINK indicate the uplink port.

Having identified an appropriate uplink we can now change the value of the LLDP enabled node:

/net/portsets/vSwitch0/ports/33554434/lldp/> ls
clients
rcache
lcache
config
stats
enable
/net/portsets/vSwitch0/ports/33554434/lldp/> typels enable
VSI_BOOL
/net/portsets/vSwitch0/ports/33554434/lldp/> get enable
0
/net/portsets/vSwitch0/ports/33554434/lldp/> set enable 1
/net/portsets/vSwitch0/ports/33554434/lldp/> get enable
1
/net/portsets/vSwitch0/ports/33554434/lldp/>

Gathering LLDP information

In my testing so far, ESXi 5.5 will start advertising information about an LLDP-enabled port by sending LLDPDUs to its connected peer port (i.e. the port on the switch):

switchcf4571#show lldp neigh

System capability legend:
B - Bridge; R - Router; W - Wlan Access Point; T - telephone;
D - DOCSIS Cable Device; H - Host; r - Repeater;
TP - Two Ports MAC Relay; S - S-VLAN; C - C-VLAN; O - Other

  Port        Device ID        Port ID       System Name    Capabilities  TTL
--------- ----------------- ------------- ----------------- ------------ -----
gi13      76 6d 6e 69 63    ec:a8:6b:f1:d   esxi55a.local        B        109
          31 00             f:16
gi25      00:90:fb:02:02:a4     fxp0      terminator2.local      H        95
                                          .
switchcf4571#show lldp neighbors gi13

Device ID: 76:6d:6e:69:63:31
Port ID: 65633a61383a36623a66313a64663a313600
Capabilities: Bridge
System Name: esxi55a.local
System description: VMware ESX Releasebuild-1331820
Port description: port 33554434 on vSwitch vSwitch0
Time To Live: 91


802.1 PVID: None
802.1 PPVID:
802.1 VLAN:
802.1 Protocol:



switchcf4571#

Methods for displaying LLDP information and its presentation will differ depending on the network switch. The example above uses a CLI but the information may also be available via SNMP, a web page or management applicaiton.

I’ve observed that ESXi 5.1 doesn’t advertise by default when LLDP is enabled. I haven’t tested ESXi 5.0 yet. In my testing on 5.1 LLDP configured mode is set to 0x05 by default, where it is 0x07 in 5.5:

/net/portsets/vSwitch0/ports/33554434/lldp/> cat config
LLDP broker config {
   configured mode:0x07
   configured transmit hold:4
   configured transmit interval:30
   configured transmit delay:2
   configured secs for fast transmit:5
   # secs to initialize lldp on a port:1
   max consecutive frames sent:1
   # frames tx in fast transmit:5
   bitmap of tlvs transmitted:0x0f
   lldp dest mac addr:01:80:c2:00:00:0e:
}
/net/portsets/vSwitch0/ports/33554434/lldp/>

It may be possible to change the configured mode with vsish… I would probably look at interfacing with VSI directly from a programming or scripting language if I were to test this further, however.

Information advertised by the switch is also available from the ESXi Shell. This information is stored in the rcache node (‘r’ for remote, LLDP information advertised by ESXi is in lcache or local cache). The information is structured but not necessarily easy to read as ASCII-encoded text is represented in hexadecimal format:

/net/portsets/vSwitch0/ports/33554434/lldp/> cat rcache
-
type: 1
dataLen: 7
data:
0x4 0xf8 0x66 0xf2 0xcf 0x45 0x71
-
type: 2
dataLen: 5
data:
0x5 0x67 0x69 0x31 0x33
-
type: 3
dataLen: 2
data:
0x0 0x78
-
type: 5
dataLen: 12
data:
0x73 0x77 0x69 0x74 0x63 0x68 0x63 0x66 0x34 0x35 0x37 0x31
-
type: 6
dataLen: 39
data:
0x53 0x47 0x33 0x30 0x30 0x2d 0x32 0x38 0x20 0x32 0x38 0x2d 0x50 0x6f 0x72 0x74 0x20 0x47 0x69 0x67 0x61
0x62 0x69 0x74 0x20 0x4d 0x61 0x6e 0x61 0x67 0x65 0x64 0x20 0x53 0x77 0x69 0x74 0x63 0x68
-
type: 127
orgUI: 0x0 0x80 0xc2
orgType: 1
dataLen: 2
data:
0x0 0x1

/net/portsets/vSwitch0/ports/33554434/lldp/>

This is probably for the best as the information here includes unprintable data from the LLDPDU. The TLV (Type, Length, Value) structures have been partially decoded. The first two are Chassis ID:

type: 1 < Chassis ID
0x4 0xf8 0x66 0xf2 0xcf 0x45 0x71
^   ^
|   +--- Switch MAC address
+-- Type of ID = MAC

and Port ID:

type: 2 < Port ID
0x5 0x67 0x69 0x31 0x33
^   ^
|   +--- "gi13"
+-- Type of ID = ???? (Oddly this doesn't match one of the types defined in RFC 2922 Physical Topology MIB)

We can confirm the value of Port ID from ESXi Shell:

~ # PORTID=`vsish -e get /net/portsets/vSwitch0/ports/33554434/lldp/rcache | grep -A5 '^type: 2' | grep -A3 '^data:' | grep '^0x'`
~ # printf `echo $PORTID | sed 's/ 0x/\\\\x/g'`\\n
0x5gi13
~ #

Limitations

Unfortunately this information does not appear to flow through to SDK clients:

~ # vim-cmd hostsvc/net/query_networkhint --pnic-name=vmnic1
(vim.host.PhysicalNic.NetworkHint) [
   (vim.host.PhysicalNic.NetworkHint) {
      dynamicType = <unset>,
      device = "vmnic1",
      connectedSwitchPort = (vim.host.PhysicalNic.CdpInfo) null,
      lldpInfo = (vim.host.PhysicalNic.LldpInfo) null,
   }
]
~ #

(note: I disabled CDP on the switch and vSwitch by setting cdp-status to ‘down’ while performing this test)

This means that the information won’t appear in the vSphere client (either the web or the “C#” client).

Also, the setting change is not persistent and will be lost on reboot.

Other information

There is some interesting and possibly useful information in the lldp/stats node (although given LLDP on standard vSwitches is undocumented and vsish is barely documented at all, its use is probably best left to VMware support):

/net/portsets/vSwitch0/ports/33554434/lldp/> cat stats
LLDP broker stats {
   transmit interval counter:15
   transmit delay counter:0
   secs left for fast transmit mode:0
   Is a frame waiting for tx delay to expire?:0
   # of transmit frames:194
   secs since last transmit:15
   # of transmit errors:0
   # of received frames:189
   secs since last receive:23
   # of dropped received frames:0
   # of malformed received frames:0
   # of rx TLVs not recognized:0
   # of remote cache flushes due to ttl expiry:0
   link status:1
}
/net/portsets/vSwitch0/ports/33554434/lldp/>

Delving any further with LLDP (for example changing the information advertised by ESXi) is probably non-trivial with vsish.

Further thoughts

Explicit lack of support from VMware for LLDP on standard vSwitches is a bit curious. I see value in exploiting this undocumented functionality during troubleshooting. The particular circumstances I recently encountered (that is misidentification during initial cabling and configuration) put a useful vSphere feature out-of-reach because of a bootstrapping problem.

As distributed virtual switches require vCenter, and vCenter requires management network connectivity to hosts (initially through a standard vSwitch), I can understand the temptation to not migrate management uplinks and vmkernel interfaces to DV switches after joining a host to vCenter. I can also understand an argument that it is incongruous that CDP be supported on both standard and distributed virtual switches but not LLDP (of course there has been a history of new features only being supported by newer or more advanced vSphere technologies and CDP was an earlier feature – perhaps at a time when Cisco and VMware considered their relationship more complementary than competitive?).

For lab environments and those without access to licensed features with LLDP support or 3rd-party vSwitch implementations, there does at least appear to be an unsupported way to get some mileage out of LLDP with standard vSwitches.

Having access to network discovery information from Layer 2 protocols won’t always obviate a need for physical cable tracing or “cable pulls”. Indeed it may be sensible to “down” ports (at either the switch or from ESXi) during troubleshooting to help confirm ports. In this case it was reassuring and helpful to have access to LLDP information during build and before the host could be connected to vCenter.

Bonus: It is also possible to block traffic on a port (set blocked 1). I discovered this during vSphere 4 Advanced Fast Track training after finishing an exercise early. That discovery is partly what prompted me to check vsish for LLDP controls. Port blocking is another feature that is limited to distributed virtual switches in the documentation.