Login

Oct 1, 2011

Adaptec ARCCONF getconfig - check Adaptec RAID array status

/usr/Storman/arcconf getconfig 1 al - Lists information about the controllers, logical drives, and physical devices.

To check the health of Adaptec RAID array on CentOS 5 server (RHEL 5 based, also Fedora) I have modified this script using ARCCONF and run it in CRON to get emailed about the status of RAID array (I have Adaptec RAID 3405 controller).

Adaptec Storage Manager software for managing, monitoring and checking the Adaptec RAID arrays is unlike 3Ware 3DM2 manager, not willing to run as a background service on CentOS / RHEL 5 based system.

You will need to download and install Adaptec utilites
and something else what I'm using is a great e-mailer script sendEmail which enables me to send email with attachment from CLI.

When Adaptec utilities are installed, you can get your RAID array information with the command:
# /usr/StorMan/arcconf getconfig 1 al
the output should be like this:
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Channel description                      : SAS/SATA
   Controller Model                         : Adaptec 3405
   Controller Serial Number                 : 7C2110BD455
   Physical Slot                            : 3
   Temperature                              : 49 C/ 120 F (Normal)
   Installed memory                         : 128 MB
   Copyback                                 : Disabled
   Background consistency check             : Disabled
   Automatic Failover                       : Enabled
   Defunct disk drive count                 : 0
   Logical devices/Failed/Degraded          : 1/0/0
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (15753)
   Firmware                                 : 5.2-0 (15753)
   Driver                                   : 1.1-5 (2453)
   Boot Flash                               : 5.2-0 (15753)
   --------------------------------------------------------
   Controller Battery Information
   --------------------------------------------------------
   Status                                   : Optimal
   Over temperature                         : No
   Capacity remaining                       : 99 percent
   Time remaining (at current draw)         : 3 days, 0 hours, 52 minutes

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
   Logical device name                      : RAID10
   RAID level                               : 10
   Status of logical device                 : Optimal
   Size                                     : 279800 MB
   Stripe-unit size                         : 256 KB
   Read-cache mode                          : Enabled
   Write-cache mode                         : Enabled (write-back)
   Write-cache setting                      : Enabled (write-back) when protected by battery
   Partitioned                              : Yes
   Protected by Hot-Spare                   : No
   Bootable                                 : Yes
   Failed stripes                           : No
   --------------------------------------------------------
   Logical device segment information
   --------------------------------------------------------
   Group 0, Segment 0                       : Present (0,0) 3LN3BY8Q00009823KDMV
   Group 0, Segment 1                       : Present (0,1) 3LN3V6AQ00009829MMLC
   Group 1, Segment 0                       : Present (0,2) 3LN1AYYD00009747RGSB
   Group 1, Segment 1                       : Present (0,3) 3LN2GAEC00009813AQW6

----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
      Device #0
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,0
         Reported Location                  : Enclosure 0, Slot 0
         Reported ESD                       : 2,0
         Vendor                             : SEAGATE
         Model                              : ST3146855SS
         Firmware                           : 0002
         Serial number                      : 3LN3BY8Q00009823KDMV
         World-wide name                    : 5000C50007BCFA20
         Size                               : 140014 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
      Device #1
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,1
         Reported Location                  : Enclosure 0, Slot 1
         Reported ESD                       : 2,0
         Vendor                             : SEAGATE
         Model                              : ST3146855SS
         Firmware                           : 0002
         Serial number                      : 3LN3V6AQ00009829MMLC
         World-wide name                    : 5000C50002F017B8
         Size                               : 140014 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
      Device #2
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,2
         Reported Location                  : Enclosure 0, Slot 2
         Reported ESD                       : 2,0
         Vendor                             : SEAGATE
         Model                              : ST3146855SS
         Firmware                           : 0002
         Serial number                      : 3LN1AYYD00009747RGSB
         World-wide name                    : 5000C50005020B14
         Size                               : 140014 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
      Device #3
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,3
         Reported Location                  : Enclosure 0, Slot 3
         Reported ESD                       : 2,0
         Vendor                             : SEAGATE
         Model                              : ST3146855SS
         Firmware                           : 0002
         Serial number                      : 3LN2GAEC00009813AQW6
         World-wide name                    : 5000C50007BD43C0
         Size                               : 140014 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
      Device #4
         Device is an Enclosure services device
         Reported Channel,Device            : 2,0
         Enclosure ID                       : 0
         Type                               : SES2
         Vendor                             : ADAPTEC
         Model                              : Virtual SGPIO  0
         Firmware                           : 0001
         Status of Enclosure services device
            Temperature                     : Normal

Command completed successfully.
You should examine the output of the arcconf command on your system before you use the script and edit it if necessary.

Now that's all ok, but if something goes bad you will not know about it until you check it again manually.
This made me do the script to check from CRON (# crontab -l -- view cron, # crontab -e -- edit cron) every hour and email me the status if something wrong (or just a status report on Wednesday and Saturday - you can modify it when you want)
arctest_status.sh
#!/bin/sh
DATE=$(date +"%F (%H:%M:%Sh)")

RAID=/var/tmp/adaptec/adaptec3405check_$(date +"%F_%H-%M-%Sh").txt
RAIDSTATUSFILE=/var/tmp/adaptec/adaptec3405status.txt

/usr/StorMan/arcconf getconfig 1 al > $RAID

CTRLSTAT=$(grep 'Controller Status' $RAID| cut -d\: -f2 | cut -d' ' -f2)
## Optimal
echo "Adaptec Status $DATE :" >$RAIDSTATUSFILE
echo "----------------------------------------" >>$RAIDSTATUSFILE
echo "Controller status : $CTRLSTAT" >>$RAIDSTATUSFILE
## CTRLBATINFO=$(grep -A 2 'Controller Battery' $RAID|grep 'Status'| cut -d\: -f2)
CTRTEMP=$(grep 'Temperature' $RAID| awk '{print $7}' | sed -e 's/^.*(\(.*\)),*/\1/')
CTRTEMPERATURE=$(grep 'Temperature' $RAID) >>$RAIDSTATUSFILE
## Normal
echo $CTRTEMPERATURE >>$RAIDSTATUSFILE
LOGICSTAT=$(grep 'Status of logical device' $RAID| cut -d\: -f2 | cut -d' ' -f2)
## Optimal
echo "Status of logical device : $LOGICSTAT" >>$RAIDSTATUSFILE
LOGICSTR=$(grep 'Failed stripes' $RAID| cut -d\: -f2 | cut -d' ' -f2)
## No
echo "Failed stripes : $LOGICSTR" >>$RAIDSTATUSFILE


# number of drives
DRIVESNO=$(grep -B 1 -A 1 'Device is a Hard' $RAID | grep -c 'Device #')

echo "Devices found : $DRIVESNO" >>$RAIDSTATUSFILE
if [ "$CTRLSTAT" = "Optimal" ]
then
# when everything is OK send the status message on Wednesday and Saturday (Wed / Sat) on 02.00 hrs, which is set to run in CRON every hour (15 * * * * /usr/local/bin/arctest_status.sh >/dev/null )
# if you don't want to get emails if nothing wrong then don't use this block if ... fi
# this should be all in 1 line
if ( [ "$(date +"%H")" = "02" ] && [ "$(date +"%a")" = "Wed" ] ) || ( [ "$(date +"%H")" = "02" ] && [ "$(date +"%a")" = "Sat" ] )

then
i="0"
while [ $i -lt "$DRIVESNO" ]
do
CURDRIVE=DRIVE$i
# this should be all in 1 line
echo "$CURDRIVE : $(grep -A 2 "Device #$i" $RAID | grep 'State' | cut -d\: -f2 | cut -d' ' -f2)" >>$RAIDSTATUSFILE
i=$[$i+1]
done
# this should be all in 1 line
/usr/local/bin/sendEmail -f "adaptec@example.com" -t "youremail@example.com" -u "Adaptec RAID status $DATE " -o message-file=$RAIDSTATUSFILE >/dev/null
fi
$(rm $RAID)


elif [ "$CTRLSTAT" != "Optimal" ]
then
## SENDTHEMAIL
cat $RAID >>$RAIDSTATUSFILE
# this should be all in 1 line
/usr/local/bin/sendEmail -f "adaptec@example.com" -t "youremail@example.com" -u "RAID FAILURE - Adaptec RAID error $DATE !" -o message-file=$RAIDSTATUSFILE -a $RAID >/dev/null

else
cat $RAID >>$RAIDSTATUSFILE
# this should be all in 1 line
/usr/local/bin/sendEmail -f "adaptec@example.com" -t "youremail@example.com" -cc "another@example.com" -u "RAID FAILURE - Adaptec RAID error $DATE !" -o message-file=$RAIDSTATUSFILE -a $RAID >/dev/null

fi


Now that's what I wanted !
and on Wednesday/Saturday I get an email with status check like this:
Adaptec Status 2011-10-01 (02:20:01h) :
----------------------------------------
Controller status : Optimal
Temperature : 51 C/ 123 F (Normal)
Status of logical device : Optimal
Failed stripes : No
Devices found : 4
DRIVE0 : Online
DRIVE1 : Online
DRIVE2 : Online
DRIVE3 : Online

No comments: