Graceful shutdown of an ESXi 5.0 server with a USB-connected APC UPS – Revisited

Why “revisited”?

It’s been over a year since I posted the original articles and, at the time, I had thought about posting the complete scripts.  However, I was only just starting to use WordPress and hadn’t found an easy way to post code so that it appeared in a scrollable box.  That’s fixed now and you can click inside the code boxes below to copy the text.

More importantly, some time ago ago I found there was a problem with the doshutdown.bat file.  I only noticed that after changing the root password on the ESXi host.    The symptom was that the parameters for plink suddenly wouldn’t expand properly and the batch job would fail.  Luckily, I hadn’t seen anything but a couple of very brief power interruptions, neither of which had triggered the shutdown sequence.

There were also a couple of things that I really should have done more neatly.

Apologies to anyone who followed the description in those earlier posts and couldn’t make it work.

upsshutdown.sh

First my (very slightly) modified version of helux’s and spike’s script, which I save in a datastore on the ESXi server.  The minor mods are described in Part 1.

#!/bin/ash
# title: powerdown-esxi4.sh
# version: 0.4
# date: january 30, 2011
# author: herwarth heitmann <herwarth@helux.nl>
# edit by: massimo vannucci <massimo.vannucci@gmail.com>
#
#Stored in /vmfs/volumes/datastore2/UPScontrol
#

#variables
PATH=/bin:/sbin:/usr/bin:/usr/sbin
VIMSH_WRAPPER=vim-cmd
VM_FILE=vm_list
INTERVAL=60
MAXLOOP=3
DATE=`date "+%Y-%m-%d   %H:%M:%S"`
# To enable logging, set the following variable to 1
LOG_ENABLED=1
# (Biggsy) remember to set the location and name of the log file!
LOG_FILE=/vmfs/volumes/datastore2/UPScontrol/powerdown-esxi4.log
# Remember that >> after echo, redirect and append to file

# Set the log file
if [ $LOG_ENABLED -eq 1 ]; then
  echo -e "\n\n\n"`date "+%Y-%m-%d   %H:%M:%S"` "\t\tExecuting powerdown-esxi4.sh" >> $LOG_FILE
fi

#check if parameter given
case "$1" in
    "") LASTACTION=shutdown
        ;;
reboot) LASTACTION=reboot
        ;;
vmonly) LASTACTION=vmonly
        ;;
     *) echo "usage $0 <|vmonly|reboot>"
        exit 1
        ;;
esac

#retrieve all VmId for VM(s) registered under ESXi host and dump them in the log file
${VIMSH_WRAPPER} vmsvc/getallvms >> $LOG_FILE
#retrieve all VmId for VM(s) registered under ESXi host
${VIMSH_WRAPPER} vmsvc/getallvms | awk '{print $1}' | grep -v 'Annotation' | grep -v 'Vmid' > $VM_FILE

#first time initialisation
ERROR=0
FIRSTRUN=1
LOOP=0

#we want to run the loop at least 1 time! and loop until no more errors occur
while [ $ERROR -ne 0 -o $FIRSTRUN -eq 1 ]; do
  LOOP=$(($LOOP+1))
  if [ $FIRSTRUN -eq 0 ]; then
    if [ $LOG_ENABLED -eq 1 ]; then
      echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tGive virtual machines time to shutdown..." >> $LOG_FILE
    else
      echo "Give virtual machines time to shutdown..."
    fi
    sleep $INTERVAL
  fi
  #exit loop if $LOOP gets bigger than $MAXLOOP
  if [ $LOOP -gt $MAXLOOP ]; then
    echo "Maximum loops reached!"
    break
  fi

  FIRSTRUN=0
  ERROR=0
  for VM_LINE in $(cat ${VM_FILE}); do
    STATE=$(${VIMSH_WRAPPER} vmsvc/power.getstate ${VM_LINE} | grep -v 'runtime')
    if [ "$STATE" = "Powered off" -o "$STATE" = "Suspended" ]; then
      if [ $LOG_ENABLED -eq 1 ]; then
        echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tVM with ID: ${VM_LINE} is: $STATE, skipping..." >> $LOG_FILE
      else
        echo "VM with ID: ${VM_LINE} is: $STATE, skipping..."
      fi
    else
      #try to do proper shutdown if VMware Tools are installed
      if [ $LOG_ENABLED -eq 1 ]; then
        echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tVM with ID: ${VM_LINE} is: $STATE, trying guest shutdown..." >> $LOG_FILE
      else
        echo "VM with ID: ${VM_LINE} is: $STATE, trying guest shutdown..."
      fi
      ${VIMSH_WRAPPER} vmsvc/power.shutdown "${VM_LINE}" > /dev/null 2>&1
      #if it fails to shutdown, we know there are no VMware Tools installed
      if [ $? -eq 1 ]; then
        #hard power off
        if [ $LOG_ENABLED -eq 1 ]; then
          echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tGuest shutdown not working, hard powering off" >> $LOG_FILE
        else
          echo -e "\tGuest shutdown not working, hard powering off"
        fi
        ${VIMSH_WRAPPER} vmsvc/power.off "${VM_LINE}" > /dev/null 2>&1
      else
        if [ $LOG_ENABLED -eq 1 ]; then
          echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tSuccessfully initiated shutdown of ${VM_LINE}" >> $LOG_FILE
        else
          echo -e "\t\tSuccessfully initiated shutdown of ${VM_LINE}"
        fi
      fi
      ERROR=$(($ERROR+1))
    fi
  done
done

# clean up temporary file
rm -f $VM_FILE

#execute last action
case "$LASTACTION" in
shutdown) #shutdown ESXi host
          if [ $LOG_ENABLED -eq 1 ]; then
            echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tShutting down ESXi host..." >> $LOG_FILE
          fi
          /sbin/poweroff
          ;;
  reboot) #reboot ESXi host
          if [ $LOG_ENABLED -eq 1 ]; then
            echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tRebooting ESXi host..." >> $LOG_FILE
          fi
          /sbin/reboot
          ;;
  vmonly) #do nothing! only VMs needed to be shutdown
          if [ $LOG_ENABLED -eq 1 ]; then
            echo -e `date "+%Y-%m-%d   %H:%M:%S"` "\t\tDo nothing with ESXi host..." >> $LOG_FILE
          fi
          ;;
esac
exit 0

APCCONTROL.BAT

I made quite a few changes to the file provided with apcupsd for Windows.  That is not recommended as the file will be overwritten by any updates to or re-installs of apcupsd.   I don’t understand why apccontrol.bat still needs to retain support for those atrocities known as Windows 95, 98, ME and SE.  A good chunk of the script is about working around deficiencies in the CI for those and I chose to bypass (rem) a lot of that stuff.

@echo off
setlocal

rem
rem  This is the Windows apccontrol file.
rem

rem Assign parameters to named variables
SET command=%1
rem (Biggsy) Next line is the "easily accomplished on NT" solution.
SET sbindir=%~5

rem Strip leading and trailing quotation marks from paths.
rem This is easily accomplished on NT, but Win95/98/ME
rem require an evil little trick with 'FOR'.
rem SET sbindir=%sbindir:"=%
rem IF "%sbindir%" == "" FOR %%A IN (%5) DO SET sbindir=%%A

rem Paths to important executables
SET APCUPSD="%sbindir%\apcupsd"
SET SHUTDOWN="%sbindir%\shutdown"
SET BACKGROUND="%sbindir%\background"

rem (Biggsy) Running on Windows 7 here so we don't need all this stuff either

rem Only do popups on Win95/98/ME/NT. All other platforms support 
rem balloon notifications which are provided by apctray.
rem SET POPUP=echo
rem VER | FIND /I "Windows 95" > NUL
rem IF NOT ERRORLEVEL 1 SET POPUP=%BACKGROUND% "%sbindir%\popup"
rem VER | FIND /I "Windows 98" > NUL
rem IF NOT ERRORLEVEL 1 SET POPUP=%BACKGROUND% "%sbindir%\popup"
rem VER | FIND /I "Windows ME" > NUL
rem IF NOT ERRORLEVEL 1 SET POPUP=%BACKGROUND% "%sbindir%\popup"
rem VER | FIND /I "Windows NT" > NUL
rem IF NOT ERRORLEVEL 1 SET POPUP=%BACKGROUND% "%sbindir%\popup"

rem
rem This piece is to substitute the default behaviour with your own script,
rem   perl, C program, etc.
rem
rem You can customize any command by creating an executable file (may be a
rem   script or a compiled program) and naming it the same as the %1 parameter
rem   passed by apcupsd to this script. We will accept files with any extension
rem   included in PATHEXT (*.exe, *.bat, *.cmd, etc).
rem
rem After executing your script, apccontrol continues with the default action.
rem   If you do not want apccontrol to continue, exit your script with exit 
rem   code 99. E.g. "exit /b 99".
rem
rem WARNING: please be aware that if you add any commands before the shutdown
rem   in the downshutdown) case and your command errors or stalls, it will
rem   prevent your machine from being shutdown, so test, test, test to
rem   make sure it works correctly.
rem
rem The apccontrol.bat file will be replaced every time apcupsd is installed,
rem   so do NOT make event modifications in this file. Instead, override the
rem   event actions using event scripts as described above.
rem

rem Use CALL here because event script might be a batch file itself
rem (Biggsy) removed the "./" from immediately before %command%
rem (Biggsy) doshutdown.bat is in the C:\apcupsd\etc\apcupsd directory. 
CALL "%command%" 2> NUL

rem This is retarded. "IF ERRORLEVEL 99" means greater-than-or-
rem equal-to 99, so we have to synthesize an == using two IFs. 
rem Ahh, the glory of Windows batch programming. At least they 
rem gave us a NOT op.
IF NOT ERRORLEVEL 99 GOTO :events
IF NOT ERRORLEVEL 100 GOTO :done

:events

rem
rem powerout, onbattery, offbattery, mainsback events occur
rem   in that order.
rem

IF "%command%" == "commfailure"   GOTO :commfailure
IF "%command%" == "commok"        GOTO :commok
IF "%command%" == "powerout"      GOTO :powerout
IF "%command%" == "onbattery"     GOTO :onbattery
IF "%command%" == "offbattery"    GOTO :offbattery
IF "%command%" == "mainsback"     GOTO :mainsback
IF "%command%" == "failing"       GOTO :failing
IF "%command%" == "timeout"       GOTO :timeout
IF "%command%" == "loadlimit"     GOTO :loadlimit
IF "%command%" == "runlimit"      GOTO :runlimit
IF "%command%" == "doshutdown"    GOTO :doshutdown
IF "%command%" == "annoyme"       GOTO :annoyme
IF "%command%" == "emergency"     GOTO :emergency
IF "%command%" == "changeme"      GOTO :changeme
IF "%command%" == "remotedown"    GOTO :remotedown
IF "%command%" == "startselftest" GOTO :startselftest
IF "%command%" == "endselftest"   GOTO :endselftest
IF "%command%" == "battdetach"    GOTO :battdetach
IF "%command%" == "battattach"    GOTO :battattach

echo Unknown command '%command%'
echo.
echo Usage: %0 command
echo.
echo Warning: this script is intended to be launched by
echo apcupsd and should never be launched by users.
GOTO :done

:commfailure
   %POPUP% "Communications with UPS lost."
   GOTO :done

:commok
   %POPUP% "Communciations with UPS restored."
   GOTO :done

:powerout
   GOTO :done

:onbattery
   %POPUP% "Power failure. Running on UPS batteries."
   GOTO :done

:offbattery
   %POPUP% "Power has returned. No longer running on UPS batteries."
   GOTO :done

:mainsback
   GOTO :done

:failing
   %POPUP% "UPS battery power exhaused. Doing shutdown."
   GOTO :done

:timeout
   %POPUP% "UPS battery runtime limit exceeded. Doing shutdown."
   GOTO :done

:loadlimit
   %POPUP% "UPS battery discharge limit reached. Doing shutdown."
   GOTO :done

:runlimit
   %POPUP% "UPS battery runtime percent reached. Doing shutdown."
   GOTO :done

:doshutdown
rem
rem  If you want to try to power down your UPS, uncomment
rem    out the following lines, but be warned that if the
rem    following shutdown -h now doesn't work, you may find
rem    the power being shut off to a running computer :-(
rem  Also note, we do this in the doshutdown case, because
rem    there is no way to get control when the machine is
rem    shutdown to call this script with --killpower. As
rem    a consequence, we do both killpower and shutdown
rem    here.
rem  Note that Win32 lacks a portable way to delay for a
rem    given time, so we use the trick of pinging a
rem    non-existent IP address with a given timeout.
rem
rem   %APCUPSD% /kill
rem   ping -n 1 -w 5000 10.255.255.254 > NUL
rem   %POPUP% "Doing %APCUPSD% --killpower"
rem   %APCUPSD% --killpower
rem   ping -n 1 -w 12000 10.255.255.254 > NUL
rem
rem (Biggsy) local system will be closed down by ESXi.  No need for this:
rem %SHUTDOWN% -h now
   GOTO :done

:annoyme
   %POPUP% "Power problems: please logoff."
   GOTO :done

:emergency
   %POPUP% "Emergency shutdown initiated."
   GOTO :done

:changeme
   %POPUP% "Emergency! UPS batteries have failed: Change them NOW"
   GOTO :done

:remotedown
   %POPUP% "Shutdown due to master state or comms lost."
   GOTO :done

:startselftest
   %POPUP% "Self-test starting"
   GOTO :done

:endselftest
   %POPUP% "Self-test completed"
   GOTO :done

:battdetach
   %POPUP% "Battery disconnected"
   GOTO :done

:battattach
   %POPUP% "Battery reattached"
   GOTO :done

:done
rem That's all, folks

DOSHUTDOWN.BAT

This is the much revised file that corrects the shortcomings of the original.  It also uses a separate file (GUS.TXT) to store the command to be sent to the ESXi server by plink.

A couple of other things about this version:

  • Added @echo off and setlocal at the start
  • Moved the parameters for plink to a variable to work around a problem with parameter expansion.  Enclosing the variable and the equals (plinkparms=) within the quotes prevents the string from being wrapped in quotes and avoids having to strip them later.
@echo off
setlocal
set "plinkparms=-v -pw <password> -m GUS.TXT <ESXi host>"
plink.exe %plinkparms%
exit /b 99

GUS.TXT

The only purpose for this file was to remove the command sent by plink (to execute the upsshutdown.sh script) from the body of the doshutdown.bat file.  The same thing might have been achievable by the method used for setting plinkparms but, using Notepad++, I also made sure that there was just a linefeed (and no carriage return) at the end of the line.

nohup ./vmfs/volumes/datastore2/UPScontrol/upsshutdown.sh &

Acknowledgements

Thank you to all the many people who created and/or maintain the various programs, scripts, tutorials and blog/forum posts that allowed me get this all working just the way I wanted.

Posted in ESXi | Comments Off on Graceful shutdown of an ESXi 5.0 server with a USB-connected APC UPS – Revisited