HOWTO: HTTP boot the HPE Proliant Service Pack ISO DVD using RESTfulAPI to update firmware without messing with WDS or PXE

Most of my customer sites consist of one to four HPE Proliant DL3xx servers running VMware ESXi and an additional HPE Proliant DL3xx running Windows 2012 R2 / 2016. HPE offers some great tools for managing their servers, but unfortunately for smaller organizations, most of HPE’s management tools (and I’m looking squarely at you Insight Control and OneView) take more time to setup and get running correctly then the time you’ll save by installing / updating a small handful of servers manually.  Therefore, I usually don’t deploy these tools to help install OSes or update firmware at my smaller client sites.  I generally just rely on booting the HPE Support Pack for Proliant (SPP) to update firmware, use a USB key with a scripted ESXi install on it for installing ESXi, and utilize WDS to install Windows directly on my Proliants when required.

Prior to HPE Proliant Gen 9 servers, I would PXE boot the Proliant Service Pack using PXELINUX and mount the ISO via NFS.  Then along came Gen 9 with UEFI.  Unfortunately, PXELINUX suffers from a complete lack of support for UEFI.  A couple of times I pestered some of the HPE SPP developers and managers in person while at HPE’s campus in Houston, but they never really showed much interest in explaining or documenting how to get network booting working with the SPP when the server utilized UEFI, so I had pretty much given up on ever getting it to work.

The other day I was playing with the HPE RESTful Interface Tool and decided to try configuring HTTP boot on DL380 Gen10 with the current SPP ISO image (P11740_001_spp-2018.11.0-SPP2018110.2018_1114.38.iso).  Much to my surprise, after modifying only a single configuration file on the ISO image, I was able to successfully boot the current SPP ISO image via HTTP and run a full firmware update on the Gen10 I was playing with.

The nice thing about this method is that because it is all done via HTTP, you don’t have to mess with or disable your WDS (Windows Deployment Services) server to add Linux support (which is what the SPP ISO is based on).  So this is great news for pure Windows shops!  And as a bonus, these steps works with Gen 9 servers too.

So how did I do it?  Before I share that, as always:

Use any tips, tricks, or scripts I post at your own risk.

First, you need to slightly modify the SPP ISO image.  Copy the original SPP ISO image to your web server (i.e. c:\inetpub\wwwroot).

Open the ISO image with your favorite ISO editor and extract \efi\boot\grub.cfg, then open the grub.cfg with a decent text editor (i.e. Notepad++, but definitely not the built-in Windows Notepad).  Scroll down the first menuentry, which will be “Automatic Firmware Update”.  Then copy and paste the following just above that menuentry:

menuentry "HTTP Firmware Update Version 2018.11.0" {
set gfxpayload=keep
echo "Loading kernel..."
linux /pxe/spp2018110/vmlinuz media=net root=/dev/ram0 ramdisk_size=10485760 init=/bin/init  iso1=http://xxx.xxx.xxx.xxx/spp.iso iso1mnt=/mnt/bootdevice hp_fibre cdcache TYPE=MANUAL AUTOPOWEROFFONSUCCESS=no modprobe.blacklist=aacraid,mpt3sas  ${linuxconsole}
echo "Loading initial ramdisk..."
initrd /pxe/spp2018110/initrd.img
}

So your grub.cfg will look like this when you are done:

2018.12.20 - 17.45.17 - SNAGIT - 0027

Adjust the http address (xxx.xxx.xxx.xxx), path, and ISO image name as required for your network, then save the updated grub.cfg and inject it back into the ISO image, over-writing the existing \efi\boot\grub.cfg, and then save the updated ISO image.

Be sure to add the .ISO mime type to your web server so that the ISO file type can be handled correctly.  The command below will work with IIS 8.5 and above to add a new mime type to IIS for .ISO.

C:\Windows\System32\inetsrv\appcmd.exe set config -section:system.webServer/staticContent /+"[fileExtension='iso',mimeType='application/iso']"

Now, you need to install the HPE RESTful Interface Tool on your machine.  The current version at the time of this writing is 2.3.4.0.  Go to the Hewlett Packard Enterprise Support Center and search for “RESTful Interface Tool for Windows”, then download and install the .msi (there is a Linux version available as well there).

Once the HPE RESTful Interface Tool is installed, run it as an Administrator.  Next, you need to connect to your server’s ILO, select the Bios object, set the UrlBootfile Entry and commit the changes.

*** NOTE: Make sure the UrlBootFile entry matches the url of your ISO image that your put on your webserver and specified as the iso1 switch in the grub.cfg entry.

ilorest
login ilo_ip_address -u admin -p password
select Bios.v1_0.0
set UrlBootFile=http://xxx.xxx.xxx.xxx/spp.iso
commit

2018.12.19 - 13.56.41 - SNAGIT - 0003

This takes care of the changes you must make to your Proliant server (keep in mind each server that you want to HTTP boot needs to have this this done).

The next time your server boots, the UrlBootFile change will be applied at the end of POST, then server will automatically reboot and start to POST again.

2018.12.19 - 14.18.08 - SNAGIT - 0005

That’s it – your configuration is all done.  Now when you reboot your server, if you hit F11 for the Boot Menu, you’ll have an entry for HTTP there – select it.

2018.12.19 - 14.20.01 - SNAGIT - 0006

After maybe 30 to 45 seconds (depending on your network speed – I’m using 10GbE), you’ll see the familiar SPP boot menu, but with an extra entry which is set as the default entry.

2018.12.19 - 14.21.25 - SNAGIT - 0009

Select it, and after about a minute (again – I’m using 10GbE) you’ll see the ISO image get mounted.

2018-12-20_17-54-25

If the image fails to mount, verify you are able to download the image you specified as the UrlBootFile from your PC.  If that works, then verify that the grub.cfg is correctly updated, with no typos.  Also – verify your server has 16GB+ of RAM in it, as the grub entry creates a 10GB RAM disk.  You may also need to upgrade the ILO firmware and drivers to current builds (such as 2.61 for ILO4 or 1.39 for ILO5) before using the iLOrest tool.

If you so desire, you could also set the new grub entry to be totally automatic by grabbing the proper switches out of the “Automatic Firmware Update” entry.  I suspect it may also be possible to split the ISO and boot one ISO without the packages folder (so it boots quicker) and mount a second the ISO with the packages folders still there to run the upgrades from.  Just to be clear, I haven’t tested that yet – it’s just a theory at this point.

I have tested this by HTTP booting over a branch office VPN tunnel which tops out at 100Mbps – it took a while for the image to load (I didn’t time it as I was working on other things at the time), but it did eventually load and it successfully updated the remote server.

When the next Support Pack for Proliant is released, all you need to do is update the grub.cfg with the correct paths and copy the updated ISO to your webserver with the same file name you used here.  You shouldn’t need to adjust the UrlBootFile on your servers.

Happy updating!

 

 

HOWTO: Fix Windows Server 2016 BSOD Stop 0x00000133 after a failed Cumulative Update installation (#WindowsServer2016 #BSOD #Microsoft)

This morning I logged into my HPE Proliant DL60 which is running Windows Server 2016 (1607) and noticed it wanted to install KB4093119, which is the “2018-04 Cumulative Update for Windows Server 2016 for x64-based Systems (KB4093119)”.  When I was done doing what I had originally logged in for, I told the update to install and reboot.  After many reboots (I don’t have an LCD on the DL60, but it’s here in my office so I can tell every time it reboots by the fans), I figured something was wrong and hopped on the ILO to see what was going on where I was greeted by a BSOD loop – STOP 0x00000133 (DPC WATCHDOG VIOLATION).  After troubleshooting (I couldn’t even get it to boot into the current build of Microsft DaRT as it too would cause a BSOD), I decided to make a Windows Server 2016 installation USB key from the setup DVD using Rufus.  (I could have booted off the DVD ISO image via the ILO, but the ILO emulates USB 2 as opposed to the physical USB 3 ports in the server, so it would have been much slower).  I copied the boot.wim to my C: drive and injected the most recent driver pack into it from the current HPE Support Pack for Proliant (in the root of the HPSPP DVD, called \WIN_DRV) before copying the wim back to the USB key.

I then booted off the USB key and when the Windows Server 2016 Setup Window opened, I hit Shift + F10 to open a command prompt.  I deleted C:\Windows\winsxs\pending.xml then ran “wpeutil reboot“.  As soon as the F9 to F12 function keys became available in the Proliant POST screen, I hit F11, which eventually brought me to the Proliant boot menu.  Here I selected “Windows Boot Manager” and then immediately started hitting F8 to get to the Windows boot options.  This allowed me to select “Last Known Good Configuration”, which allowed the server to boot into Windows without a Stop 0x00000133 (note – selecting just “Last Known Good Configuration”, or deleting just C:\Windows\winsxs\pending.xml won’t help – I found you must do both).

I retried the Windows Update a couple of times only to have the same thing happen again and again, so I followed the above steps again and again to get Windows Server 2016 running again.  Then I Googled the KB number (KB4093119) and went to the Microsoft Support article about it.  At the bottom of the support article, under “How to get this update” is a link to the stand-alone installation package on the Microsoft Update Catalog website.  This allowed me to download the .msu for KB4093119, which I saved to C:\DL\UPDATES as KB4093119.msu.

I rebooted once again off the Windows Server 2016 installation USB key.  When the Windows Server 2016 Setup Window opened, I hit Shift + F10 to open a command prompt.  In the command prompt, I ran the following two commands:

md c:\temp
dism /image:c:\ /add-package /packagepath:c:\dl\updates\KB4093119.msu /scratchdir:c:\temp

Once dism finished successfully, I rebooted the server with “wpeutil reboot

Windows Server 2016 finally booted successfully with the Cumulative Update installed, and I did another check for updates – there were no updates left to install.  I suspect these same steps would work on a Windows 10 machine that is having similar issues – although I don’t know if “Last Known Good Configuration” is an option with the most current version of Windows 10.

As always – Use any tips, tricks, or scripts I post at your own risk.

HOWTO: Turn on a HDD UID on a HPE Proliant in VMware with HPSSACLI

This morning we needed to replace a hard drive in a HPE Proliant running VMware ESXi at a remote site that had a PFA on it.  Unfortunately, while ILO is great at identifying the defective drive, it has no ability to enable the UID on the drive, and given that this unit is at a remote site, we had no way of knowing in advanced if the fault light was actually turn on for this drive before the HPE field engineering arrived to swap the drive.  So after digging through the help documentation, I found the necessary HPSSACLI command to enable the drive’s UID.

First, to get a list of all the physical drives in an ESXi host, SSH the host and run this command:

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 physicaldrive all show

This should output a list of all the drives in the system as shown below.

2016.05.19 - 10.14.13 - SNAGIT -  0005

Next, to enable the blue UID LED for 1 hour on port 2I, box 1, bay 2, run this command:

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 physicaldrive 2I:1:8 modify led=on duration=3600

The blue UID should now come on for 1 hour and then shut off on it’s own.  If you want want to manually shut if off before the 1 hour is up, run the same command again, but change the “led=on” to “led=off”.

As always – Use any tips, tricks, or scripts I post at your own risk.

Upgrade a stuck ILO firmware via SSH

We have had a rash of issues where by upgrading ILO firmware via the WebUI has been failing.  It looks like it finishes, but when you log back in, it is still the original firmware from when you started the upgraded.  And no matter what you do via the WebUI, it just will not upgrade.  So to upgrade the stubborn firmware, the simplest thing to do is SSH the ILO directly and upload the firmware via the console interface.  Below are the steps to do this.

First, you need a running web server to pull the firmware from.  IIS is usually the handiest, so it is simply a matter of adding a mime-type for the binary firmware file.  Open an administrative command prompt and run:

c:\windows\system32\inetsrv\appcmd.exe set config /section:staticContent /+"[fileExtension='.bin',mimeType='application/x-bin']"
iisreset /restart

Extract the ILO firmware bin with 7-Zip and put the bin somewhere within IIS that you can download it.   Next – to save myself extra grief, I also make sure I can actually download the firmware to a regular PC with a browser before continuing.  So open the browser of your choice and make sure you can download the bin to your PC before continuing.

Putty the ILO interface, accepting the SSH key (if prompted), and login.  Once logged in, check, then download the new firmware with the following commands.

*** Note – the ILO will automatically reboot once it successfully downloads the firmware and does not give any indication of the reboot.  As a result, you might want to start a continuous ping to the ILO to see once it has rebooted and is back up ***

show /map1/firmware1
cd /map1/firmware1
load -source http://http_server_ip/ilox_xxx.bin

Once the ILO reboots, you should have a working ILO with the firmware version you want / need.

As always – Use any tips, tricks, or scripts I post at your own risk.

2016.05.12 - 19.34.52 - SNAGIT -  0097

HOWTO: Monitor the rebuild status of a HPE SmartArray in ESXi 5.5

To monitor the rebuild status of a HP SmartArray controller in VMware ESXi 5.5, you need to have the HP VMware tools bundle installed (which is installed if the server was installed from the HP VMware media / ISO).  Once the tools bundle has been installed, simply SSH the server (or go right on the console, either physically or via ILO), login and run:

/opt/hp/hpssacli/bin/hpssacli ctrl all show status

This will provide you a list of all the SmartArray controllers in the server.  From this list, find the slow number of the controller that contains the logical drive you need to check the status on and run the following command (substitute slot=XX for the slot value you determined with the previous command):

/opt/hp/hpssacli/bin/hpssacli ctrl slot=XX ld all show

2016.04.14 - 09.12.11 - SNAGIT -  0000

If you happen to running an older version of ESXi 5.x, or your HP VMware Tools bundle is not somewhat recent, then the commands are somewhat different.  In this case the correct commands are:

/opt/hp/hpacucli/bin/hpacucli
ctrl all show
ctrl slot=0 ld all show