Problem: Extra paths are visible in powermt display dev=all output. We are still working on determining the root cause. What seems to happen is:
1) Initially, the number of paths is correct. Assuming that each LUN should be using four paths, you can determine the correct number as follows:
And the current number:
The current number of SCSI devices:
The HBA reports the correct number of paths:
2) A LIP is issued:
3) Several new paths appear.
The extra /dev/sg* devices are created when the LIP reports that new devices have been discovered on the SCSI bus. Since the HBA driver is responsible for reporting the paths to the system, we currently believe that the HBA driver and/or storage frame is at fault. EMC specifically mentions that the SPC2 bit must be set to "enabled", which is not an online change; the hosts must be rebooted to pick up the change. This doesn't seem to be related, however, since the paths spontaneously appear even though we can query the HBA with scli shortly afterward and it reports the correct number of paths.
This was happening on [server name redacted].
And another "healthy" server in the same cluster:
The first 440 /dev/sg* devices were created on system bootup. The extraneous 900+ were created on November 15th, 2012, at the same time some LUNs were added:
The script we use to add storage issues a LIP, so that caused the extra /dev/sg* paths to be discovered. Here's an example emcpower device with too many paths:
Let's figure out how those paths correlate to /dev/sg* devices:
Look suspicious? There are four /dev/sg* devices numbered under 440, which is the number of total SCSI paths that should be on the system. Let's make sure those are valid:
That's just an example for RAC. What you want to do is make sure that the /dev/sd* devices are accessible before we blow away the extraneous ones. Something like this would work as well:
All four paths valid? Good. Now that we know how to suss out invalid paths manually, let's do it the easy way:
Note: This will delete extra SCSI paths; the actual 'delete' command has been commented out below. Make sure you're okay with the possibility of a server crash before uncommenting and running it.
And finally, issue another LIP:
Our device should be back to normal: