To check the health of Adaptec RAID array on CentOS 5 server (RHEL 5 based, also Fedora) I have modified this script using ARCCONF and run it in CRON to get emailed about the status of RAID array (I have Adaptec RAID 3405 controller).
Adaptec Storage Manager software for managing, monitoring and checking the Adaptec RAID arrays is unlike 3Ware 3DM2 manager, not willing to run as a background service on CentOS / RHEL 5 based system.
You will need to download and install Adaptec utilites
and something else what I'm using is a great e-mailer script sendEmail which enables me to send email with attachment from CLI.
and something else what I'm using is a great e-mailer script sendEmail which enables me to send email with attachment from CLI.
When Adaptec utilities are installed, you can get your RAID array information with the command:
# /usr/StorMan/arcconf getconfig 1 althe output should be like this:
Controllers found: 1 ---------------------------------------------------------------------- Controller information ---------------------------------------------------------------------- Controller Status : Optimal Channel description : SAS/SATA Controller Model : Adaptec 3405 Controller Serial Number : 7C2110BD455 Physical Slot : 3 Temperature : 49 C/ 120 F (Normal) Installed memory : 128 MB Copyback : Disabled Background consistency check : Disabled Automatic Failover : Enabled Defunct disk drive count : 0 Logical devices/Failed/Degraded : 1/0/0 -------------------------------------------------------- Controller Version Information -------------------------------------------------------- BIOS : 5.2-0 (15753) Firmware : 5.2-0 (15753) Driver : 1.1-5 (2453) Boot Flash : 5.2-0 (15753) -------------------------------------------------------- Controller Battery Information -------------------------------------------------------- Status : Optimal Over temperature : No Capacity remaining : 99 percent Time remaining (at current draw) : 3 days, 0 hours, 52 minutes ---------------------------------------------------------------------- Logical device information ---------------------------------------------------------------------- Logical device number 0 Logical device name : RAID10 RAID level : 10 Status of logical device : Optimal Size : 279800 MB Stripe-unit size : 256 KB Read-cache mode : Enabled Write-cache mode : Enabled (write-back) Write-cache setting : Enabled (write-back) when protected by battery Partitioned : Yes Protected by Hot-Spare : No Bootable : Yes Failed stripes : No -------------------------------------------------------- Logical device segment information -------------------------------------------------------- Group 0, Segment 0 : Present (0,0) 3LN3BY8Q00009823KDMV Group 0, Segment 1 : Present (0,1) 3LN3V6AQ00009829MMLC Group 1, Segment 0 : Present (0,2) 3LN1AYYD00009747RGSB Group 1, Segment 1 : Present (0,3) 3LN2GAEC00009813AQW6 ---------------------------------------------------------------------- Physical Device information ---------------------------------------------------------------------- Device #0 Device is a Hard drive State : Online Supported : Yes Transfer Speed : SAS 3.0 Gb/s Reported Channel,Device : 0,0 Reported Location : Enclosure 0, Slot 0 Reported ESD : 2,0 Vendor : SEAGATE Model : ST3146855SS Firmware : 0002 Serial number : 3LN3BY8Q00009823KDMV World-wide name : 5000C50007BCFA20 Size : 140014 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No Device #1 Device is a Hard drive State : Online Supported : Yes Transfer Speed : SAS 3.0 Gb/s Reported Channel,Device : 0,1 Reported Location : Enclosure 0, Slot 1 Reported ESD : 2,0 Vendor : SEAGATE Model : ST3146855SS Firmware : 0002 Serial number : 3LN3V6AQ00009829MMLC World-wide name : 5000C50002F017B8 Size : 140014 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No Device #2 Device is a Hard drive State : Online Supported : Yes Transfer Speed : SAS 3.0 Gb/s Reported Channel,Device : 0,2 Reported Location : Enclosure 0, Slot 2 Reported ESD : 2,0 Vendor : SEAGATE Model : ST3146855SS Firmware : 0002 Serial number : 3LN1AYYD00009747RGSB World-wide name : 5000C50005020B14 Size : 140014 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No Device #3 Device is a Hard drive State : Online Supported : Yes Transfer Speed : SAS 3.0 Gb/s Reported Channel,Device : 0,3 Reported Location : Enclosure 0, Slot 3 Reported ESD : 2,0 Vendor : SEAGATE Model : ST3146855SS Firmware : 0002 Serial number : 3LN2GAEC00009813AQW6 World-wide name : 5000C50007BD43C0 Size : 140014 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No Device #4 Device is an Enclosure services device Reported Channel,Device : 2,0 Enclosure ID : 0 Type : SES2 Vendor : ADAPTEC Model : Virtual SGPIO 0 Firmware : 0001 Status of Enclosure services device Temperature : Normal Command completed successfully.
Now that's all ok, but if something goes bad you will not know about it until you check it again manually.
This made me do the script to check from CRON (# crontab -l -- view cron, # crontab -e -- edit cron) every hour and email me the status if something wrong (or just a status report on Wednesday and Saturday - you can modify it when you want)
DATE=$(date +"%F (%H:%M:%Sh)")
RAID=/var/tmp/adaptec/adaptec3405check_$(date +"%F_%H-%M-%Sh").txt
/usr/StorMan/arcconf getconfig 1 al > $RAID
CTRLSTAT=$(grep 'Controller Status' $RAID| cut -d\: -f2 | cut -d' ' -f2)
## Optimal
echo "Adaptec Status $DATE :" >$RAIDSTATUSFILE
echo "----------------------------------------" >>$RAIDSTATUSFILE
echo "Controller status : $CTRLSTAT" >>$RAIDSTATUSFILE
## CTRLBATINFO=$(grep -A 2 'Controller Battery' $RAID|grep 'Status'| cut -d\: -f2)
CTRTEMP=$(grep 'Temperature' $RAID| awk '{print $7}' | sed -e 's/^.*(\(.*\)),*/\1/')
## Normal
LOGICSTAT=$(grep 'Status of logical device' $RAID| cut -d\: -f2 | cut -d' ' -f2)
## Optimal
echo "Status of logical device : $LOGICSTAT" >>$RAIDSTATUSFILE
LOGICSTR=$(grep 'Failed stripes' $RAID| cut -d\: -f2 | cut -d' ' -f2)
## No
echo "Failed stripes : $LOGICSTR" >>$RAIDSTATUSFILE
# number of drives
DRIVESNO=$(grep -B 1 -A 1 'Device is a Hard' $RAID | grep -c 'Device #')
echo "Devices found : $DRIVESNO" >>$RAIDSTATUSFILE
if [ "$CTRLSTAT" = "Optimal" ]
# when everything is OK send the status message on Wednesday and Saturday (Wed / Sat) on 02.00 hrs, which is set to run in CRON every hour (15 * * * * /usr/local/bin/ >/dev/null )
# if you don't want to get emails if nothing wrong then don't use this block if ... fi
# this should be all in 1 line
if ( [ "$(date +"%H")" = "02" ] && [ "$(date +"%a")" = "Wed" ] ) || ( [ "$(date +"%H")" = "02" ] && [ "$(date +"%a")" = "Sat" ] )
while [ $i -lt "$DRIVESNO" ]
# this should be all in 1 line
echo "$CURDRIVE : $(grep -A 2 "Device #$i" $RAID | grep 'State' | cut -d\: -f2 | cut -d' ' -f2)" >>$RAIDSTATUSFILE
# this should be all in 1 line
/usr/local/bin/sendEmail -f "" -t "" -u "Adaptec RAID status $DATE " -o message-file=$RAIDSTATUSFILE >/dev/null
$(rm $RAID)
elif [ "$CTRLSTAT" != "Optimal" ]
# this should be all in 1 line
/usr/local/bin/sendEmail -f "" -t "" -u "RAID FAILURE - Adaptec RAID error $DATE !" -o message-file=$RAIDSTATUSFILE -a $RAID >/dev/null
# this should be all in 1 line
/usr/local/bin/sendEmail -f "" -t "" -cc "" -u "RAID FAILURE - Adaptec RAID error $DATE !" -o message-file=$RAIDSTATUSFILE -a $RAID >/dev/null
Now that's what I wanted !
and on Wednesday/Saturday I get an email with status check like this:
Adaptec Status 2011-10-01 (02:20:01h) : ---------------------------------------- Controller status : Optimal Temperature : 51 C/ 123 F (Normal) Status of logical device : Optimal Failed stripes : No Devices found : 4 DRIVE0 : Online DRIVE1 : Online DRIVE2 : Online DRIVE3 : Online
No comments:
Post a Comment