Wednesday, December 5, 2012

Clearing Phantom Paths on a Server with PowerPath Installed


Overview

Problem: Extra paths are visible in powermt display dev=all output. We are still working on determining the root cause. What seems to happen is:
1) Initially, the number of paths is correct. Assuming that each LUN should be using four paths, you can determine the correct number as follows:
echo $(( $(powermt display dev=all | grep emcpower | wc -l) * 4 ))

And the current number:
powermt display dev=all | grep qla | wc -l

The current number of SCSI devices:
lsscsi

The HBA reports the correct number of paths:
echo $(( $(scli -l 0 | grep ^LUN | wc -l) - 4 + $(scli -l 1 | grep ^LUN | wc -l) ))
ls -l /dev/sg*

2) A LIP is issued:
echo 1 |tee /sys/class/fc_host/host?/issue_lip
# This operation performs a Loop Initialization Protocol (LIP)
#    and then scans the interconnect and causes the SCSI layer to be updated
#    to reflect the devices currently on the bus. A LIP is, essentially, a bus reset,
#    and will cause device addition and removal. This procedure is necessary to configure
#    a new SCSI target on a Fibre Channel interconnect. Bear in mind that issue_lip is
#    an asynchronous operation. The command may complete before the entire scan has completed.
#    You must monitor /var/log/messages to determine when it is done.
#    The lpfc and qla2xxx drivers support issue_lip.
#    For more information about the API capabilities supported by each driver in Red Hat Enterprise Linux,
#    refer to Table 1, ¿Fibre-Channel API Capabilities¿.

3) Several new paths appear.
The extra /dev/sg* devices are created when the LIP reports that new devices have been discovered on the SCSI bus. Since the HBA driver is responsible for reporting the paths to the system, we currently believe that the HBA driver and/or storage frame is at fault. EMC specifically mentions that the SPC2 bit must be set to "enabled", which is not an online change; the hosts must be rebooted to pick up the change. This doesn't seem to be related, however, since the paths spontaneously appear even though we can query the HBA with scli shortly afterward and it reports the correct number of paths.

Example and Mitigation

This was happening on [server name redacted].
# The number of paths that "scli -l" shows that the HBA is reporting to the server.
# Subtracting 4 because the last 4 devices are LUNZ/VRAID devices instead of Symmetrix or Clariion.
root@xxxxx:TEST:scsi_device> echo $(( $(scli -l 0 | grep ^LUN | wc -l) - 4 + $(scli -l 1 | grep ^LUN | wc -l) ))
440
# The number of paths that should exist
root@xxxxx:TEST:scsi_device> echo $(( $(powermt display dev=all | grep emcpower | wc -l) * 4 ))
440
# The number of paths that PowerPath is reporting
root@xxxxx:TEST:scsi_device> powermt display dev=all | grep qla | wc -l
1419
# The number of SCSI devices that the system sees
# Subtracting 4 because the last 4 devices are LUNZ/VRAID devices
root@xxxxx:TEST:scsi_device> echo $(( $(lsscsi | wc -l) - 4 ))
1419
And another "healthy" server in the same cluster:
# The number of paths that "scli -l" shows that the HBA is reporting to the server.
# Subtracting 4 because the last 4 devices are LUNZ/VRAID devices instead of Symmetrix or Clariion.
root@xxxxx:TEST:~> echo $(( $(scli -l 0 | grep ^LUN | wc -l) - 4 + $(scli -l 1 | grep ^LUN | wc -l) ))
440
# The number of paths that should exist
root@xxxxx:TEST:~> echo $(( $(powermt display dev=all | grep emcpower | wc -l) * 4 ))
440
# The number of paths that PowerPath is reporting
root@xxxxx:TEST:~> powermt display dev=all | grep qla | wc -l
440
# The number of SCSI devices that the system sees
# Subtracting 4 because the last 4 devices are LUNZ/VRAID devices
root@xxxxx:TEST:~> echo $(( $(lsscsi | wc -l) - 4 ))
440
The first 440 /dev/sg* devices were created on system bootup. The extraneous 900+ were created on November 15th, 2012, at the same time some LUNs were added:
xxxxx,xxxxx,2012-11-15 21:29:57.398563,"Added LUN(s) [redacted]..."
The script we use to add storage issues a LIP, so that caused the extra /dev/sg* paths to be discovered. Here's an example emcpower device with too many paths:
root@xxxxx:TEST:device> powermt display dev=emcpowercz
Pseudo name=emcpowercz
Symmetrix ID=xxxxx
Logical device ID=YYYY
state=alive; policy=SymmOpt; priority=0; queued-IOs=0
==============================================================================
---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---
###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors
==============================================================================
   1 qla2xxx                   sdabg     FA 15gB   active  alive      0      0
   1 qla2xxx                   sdabt     FA 15gB   active  alive      0      0
   1 qla2xxx                   sdajq     FA 15gB   active  alive      0      0
   1 qla2xxx                   sdakd     FA 15gB   active  alive      0      0
   1 qla2xxx                   sdasa     FA 15gB   active  alive      0      0
   1 qla2xxx                   sdasn     FA 15gB   active  alive      0      0
   1 qla2xxx                   sdbak     FA 15gB   active  alive      0      0
   1 qla2xxx                   sdbax     FA 15gB   active  alive      0      0
   0 qla2xxx                   sdgu      FA  5eB   active  alive      0      0
   1 qla2xxx                   sdgv      FA 12eB   active  alive      0      0
   0 qla2xxx                   sdor      FA  2gB   active  alive      0      0
   1 qla2xxx                   sdox      FA 15gB   active  alive      0      0
   1 qla2xxx                   sdtj      FA 15gB   active  alive      0      0
Let's figure out how those paths correlate to /dev/sg* devices:
root@sxxxxx:TEST:device> lsscsi -g | egrep "sdabg|sdabt|sdajq|sdakd|sdasa|sdasn|sdbak|sdbax|sdgu|sdgv|sdor|sdox|sdtj"
[0:0:0:101]  disk    EMC      SYMMETRIX        5874  /dev/sdgu  /dev/sg202
[0:0:1:101]  disk    EMC      SYMMETRIX        5874  /dev/sdor  /dev/sg407
[1:0:0:101]  disk    EMC      SYMMETRIX        5874  /dev/sdgv  /dev/sg203
[1:0:1:101]  disk    EMC      SYMMETRIX        5874  /dev/sdox  /dev/sg413
[1:0:1:12389]disk    EMC      SYMMETRIX        5874  /dev/sdtj  /dev/sg529
[1:0:1:34309]disk    EMC      SYMMETRIX        5874  /dev/sdabg  /dev/sg734
[1:0:1:34325]disk    EMC      SYMMETRIX        5874  /dev/sdabt  /dev/sg747
[1:0:1:38405]disk    EMC      SYMMETRIX        5874  /dev/sdajq  /dev/sg952
[1:0:1:38421]disk    EMC      SYMMETRIX        5874  /dev/sdakd  /dev/sg965
[1:0:1:42501]disk    EMC      SYMMETRIX        5874  /dev/sdasa  /dev/sg1170
[1:0:1:42517]disk    EMC      SYMMETRIX        5874  /dev/sdasn  /dev/sg1183
[1:0:1:46597]disk    EMC      SYMMETRIX        5874  /dev/sdbak  /dev/sg1388
[1:0:1:46613]disk    EMC      SYMMETRIX        5874  /dev/sdbax  /dev/sg1401
Look suspicious? There are four /dev/sg* devices numbered under 440, which is the number of total SCSI paths that should be on the system. Let's make sure those are valid:
for dev in sdgu sdor sdgv sdox; do oracleasm querydisk /dev/${dev}1; done
Device "/dev/sdgu1" is marked an ASM disk with the label "XXXXXXXXX"
Device "/dev/sdor1" is marked an ASM disk with the label "XXXXXXXXX"
Device "/dev/sdgv1" is marked an ASM disk with the label "XXXXXXXXX"
Device "/dev/sdox1" is marked an ASM disk with the label "XXXXXXXXX"
That's just an example for RAC. What you want to do is make sure that the /dev/sd* devices are accessible before we blow away the extraneous ones. Something like this would work as well:
for dev in sdgu sdor sdgv sdox; do od -c /dev/${dev}1 | head -10; done
All four paths valid? Good. Now that we know how to suss out invalid paths manually, let's do it the easy way:
Note: This will delete extra SCSI paths; the actual 'delete' command has been commented out below. Make sure you're okay with the possibility of a server crash before uncommenting and running it.
VALID_TMP="/tmp/.valid_devices.$(date "+%m%d%Y")"
ALL_TMP="/tmp/.all_devices.$(date "+%m%d%Y")"
 
# Discover the paths that the HBA is reporting
for hba in 0 1; do
    sudo scli -l ${hba} | grep -Po "sd\w+"
done > ${VALID_TMP}
 
# Discover the paths that the OS is reporting
sudo powermt display dev=all | grep -Po "sd\w+" > ${ALL_TMP}
 
for device in $(cat ${ALL_TMP}); do
    grep -P "^${device}$" ${VALID_TMP} &>/dev/null
    if [ $? -eq 1 ]; then
        echo "Device ${device} is invalid.  Deleting..."
        #echo 1 | sudo tee /sys/block/${device}/device/delete &>/dev/null
    fi
done
 
rm -f ${VALID_TMP} ${ALL_TMP}
And finally, issue another LIP:
echo 1 |sudo tee /sys/class/fc_host/host?/issue_lip
Our device should be back to normal:
root@xxxxxx:TEST:~> powermt display dev=emcpowercz
Pseudo name=emcpowercz
Symmetrix ID=xxxxxx
Logical device ID=YYYY
state=alive; policy=SymmOpt; priority=0; queued-IOs=0
==============================================================================
---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---
###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors
==============================================================================
   0 qla2xxx                   sdgu      FA  5eB   active  alive      0      0
   1 qla2xxx                   sdgv      FA 12eB   active  alive      0      0
   0 qla2xxx                   sdor      FA  2gB   active  alive      0      0
   1 qla2xxx                   sdox      FA 15gB   active  alive      0      0