|
The
role of SCSI diagnostic test tools in the iSCSI manufacturer
environment - Mike Jones,
PTI
Lakewood, CO - April 9,2002
As iSCSI begins moving from
designs to real world products, diagnostic tools are needed for a
variety of purposes. This article will address a few real-world
experiences gathered while working in this new environment.
What are the
issues?
At the highest level, the question is
"is this storage subsystem working?" Does the computer
system recognize the disk on the other end of the wire? Is the
capacity of the disk readable? Can the inquiry data be shown? Can the
disk write and read data reliably?
Once communication is established and
verified, lower-level functions need to be confirmed. Can the write
cache on the drive be turned on/off? Can new firmware be downloaded
into the drive? Does the entire storage subsystem respond in a
reliable way when an error occurs? These errors could be drive related
(a drive failure), or system related (an illegal command sent from a
software application).
Device / Firmware development
In the real world, disk drives do not
always operate strictly according to standards. Will your storage
system crash or misbehave if a drive has a peculiarity? For instance,
in experimenting with an iSCSI ->Fibre Channel bridge/router this
week I discovered a particular brand of disk drive that did not
support the SCSI command that the iSCSI bridge was relying on when it
did fibre channel device discovery. The drive failed this command, and
the bridge decided that that rack of drives was not there. Invisible
drives!
By using a controlled environment
SCSI design tool (PTI’s SCSI toolbox32) we were able to quickly
ascertain what the offending SCSI command was, duplicate that command
and collect detailed information about how the drive was failing. We
then took this information to the software engineers at the iSCSI
bridge company, they made changes to their software, and voila –
within 30 minutes our bridge could now use Hitachi fibre channel
drives!
Functional and performance testing
The SCSI toolbox32 provides several
"layers" of testing needed for iSCSI work. Its hot bus
scanning allows discovering devices added to or removed from the iSCSI
connection. Once a drive is discovered any SCSI command can be tested.
In theory any legal or illegal SCSI command should be supported in the
iSCSI environment. In today’s reality we are dealing with bridges
accomplishing the protocol conversion between iSCSI and SCSI/FC. Any
time there is protocol conversion there is a possibility for errors,
and the SCSI toolbox32 helps identify those errors. Since it generates
known good (or known bad) SCSI commands, the bridge conversion process
can be completely tested and understood. SCSI compliance tests can be
used to insure that all SCSI 2 and SCSI 3 commands are supported
correctly.
Once command compliance is assured,
testing can move into a performance phase. Writes and reads of varying
blocks per transfer can be sent to one or more drives, from one or
more source computers. Raw "best case" performance can be
measured to one drive. "Real world" performance can be
measured using multiple synchronized computers sending multiple data
streams to one or more drives or volumes. Tests running 128 deep
queued commands to multiple drives can easily generate enough data to
completely swamp the iSCSI subsystem for "torture" type
testing.
Surround your unknowns with knowns
In summary, testing an iSCSI HBA or
an iSCSI->SCSI/FC bridge or router is easily accomplished with the
following pieces:
1. A test tool that can
generate known good SCSI traffic, and can eloquently deal with
and report all data gathered during any error condition.
2. A known good SCSI or
fibre channel disk drive.
In between these two "knowns"
is placed the iSCSI HBA or iSCSI->SCSI/FC bridge or router. The
theory is then "if something doesn’t work right, it’s the HBA
or the bridge or the cables". As I mentioned above about the
"invisible drives", this test setup can provide for very
fast identification and correction of bugs.
One more example
In closing, another example came up
when we used the SCSI toolbox32 to send an INQUIRY command that asked
for 6 bytes of data to be returned (a perfectly legal thing to do).
The iSCSI bridge received our iSCSI command, converted it to fibre
channel, sent it to the drive, and got the data back. But then,
instead of sending back the 6 bytes that we asked for, the bridge sent
back 32 bytes of data. This made certain layers of the operating
system device drivers very unhappy – trying to stuff 32 bytes of
data into a 6 byte sack! The good news was that it was very easy to
reproduce the error, the error information obtained was everything
needed, and once again the firmware in the bridge was fixed in a very
short time.
|