HOWTO: Fix Windows Server 2016 BSOD Stop 0x00000133 after a failed Cumulative Update installation (#WindowsServer2016 #BSOD #Microsoft)

This morning I logged into my HPE Proliant DL60 which is running Windows Server 2016 (1607) and noticed it wanted to install KB4093119, which is the “2018-04 Cumulative Update for Windows Server 2016 for x64-based Systems (KB4093119)”.  When I was done doing what I had originally logged in for, I told the update to install and reboot.  After many reboots (I don’t have an LCD on the DL60, but it’s here in my office so I can tell every time it reboots by the fans), I figured something was wrong and hopped on the ILO to see what was going on where I was greeted by a BSOD loop – STOP 0x00000133 (DPC WATCHDOG VIOLATION).  After troubleshooting (I couldn’t even get it to boot into the current build of Microsft DaRT as it too would cause a BSOD), I decided to make a Windows Server 2016 installation USB key from the setup DVD using Rufus.  (I could have booted off the DVD ISO image via the ILO, but the ILO emulates USB 2 as opposed to the physical USB 3 ports in the server, so it would have been much slower).  I copied the boot.wim to my C: drive and injected the most recent driver pack into it from the current HPE Support Pack for Proliant (in the root of the HPSPP DVD, called \WIN_DRV) before copying the wim back to the USB key.

I then booted off the USB key and when the Windows Server 2016 Setup Window opened, I hit Shift + F10 to open a command prompt.  I deleted C:\Windows\winsxs\pending.xml then ran “wpeutil reboot“.  As soon as the F9 to F12 function keys became available in the Proliant POST screen, I hit F11, which eventually brought me to the Proliant boot menu.  Here I selected “Windows Boot Manager” and then immediately started hitting F8 to get to the Windows boot options.  This allowed me to select “Last Known Good Configuration”, which allowed the server to boot into Windows without a Stop 0x00000133 (note – selecting just “Last Known Good Configuration”, or deleting just C:\Windows\winsxs\pending.xml won’t help – I found you must do both).

I retried the Windows Update a couple of times only to have the same thing happen again and again, so I followed the above steps again and again to get Windows Server 2016 running again.  Then I Googled the KB number (KB4093119) and went to the Microsoft Support article about it.  At the bottom of the support article, under “How to get this update” is a link to the stand-alone installation package on the Microsoft Update Catalog website.  This allowed me to download the .msu for KB4093119, which I saved to C:\DL\UPDATES as KB4093119.msu.

I rebooted once again off the Windows Server 2016 installation USB key.  When the Windows Server 2016 Setup Window opened, I hit Shift + F10 to open a command prompt.  In the command prompt, I ran the following two commands:

md c:\temp
dism /image:c:\ /add-package /packagepath:c:\dl\updates\KB4093119.msu /scratchdir:c:\temp

Once dism finished successfully, I rebooted the server with “wpeutil reboot

Windows Server 2016 finally booted successfully with the Cumulative Update installed, and I did another check for updates – there were no updates left to install.  I suspect these same steps would work on a Windows 10 machine that is having similar issues – although I don’t know if “Last Known Good Configuration” is an option with the most current version of Windows 10.

As always – Use any tips, tricks, or scripts I post at your own risk.

HOWTO: Turn on a HDD UID on a HPE Proliant in VMware with HPSSACLI

This morning we needed to replace a hard drive in a HPE Proliant running VMware ESXi at a remote site that had a PFA on it.  Unfortunately, while ILO is great at identifying the defective drive, it has no ability to enable the UID on the drive, and given that this unit is at a remote site, we had no way of knowing in advanced if the fault light was actually turn on for this drive before the HPE field engineering arrived to swap the drive.  So after digging through the help documentation, I found the necessary HPSSACLI command to enable the drive’s UID.

First, to get a list of all the physical drives in an ESXi host, SSH the host and run this command:

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 physicaldrive all show

This should output a list of all the drives in the system as shown below.

2016.05.19 - 10.14.13 - SNAGIT -  0005

Next, to enable the blue UID LED for 1 hour on port 2I, box 1, bay 2, run this command:

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 physicaldrive 2I:1:8 modify led=on duration=3600

The blue UID should now come on for 1 hour and then shut off on it’s own.  If you want want to manually shut if off before the 1 hour is up, run the same command again, but change the “led=on” to “led=off”.

As always – Use any tips, tricks, or scripts I post at your own risk.

Upgrade a stuck ILO firmware via SSH

We have had a rash of issues where by upgrading ILO firmware via the WebUI has been failing.  It looks like it finishes, but when you log back in, it is still the original firmware from when you started the upgraded.  And no matter what you do via the WebUI, it just will not upgrade.  So to upgrade the stubborn firmware, the simplest thing to do is SSH the ILO directly and upload the firmware via the console interface.  Below are the steps to do this.

First, you need a running web server to pull the firmware from.  IIS is usually the handiest, so it is simply a matter of adding a mime-type for the binary firmware file.  Open an administrative command prompt and run:

c:\windows\system32\inetsrv\appcmd.exe set config /section:staticContent /+"[fileExtension='.bin',mimeType='application/x-bin']"
iisreset /restart

Extract the ILO firmware bin with 7-Zip and put the bin somewhere within IIS that you can download it.   Next – to save myself extra grief, I also make sure I can actually download the firmware to a regular PC with a browser before continuing.  So open the browser of your choice and make sure you can download the bin to your PC before continuing.

Putty the ILO interface, accepting the SSH key (if prompted), and login.  Once logged in, check, then download the new firmware with the following commands.

*** Note – the ILO will automatically reboot once it successfully downloads the firmware and does not give any indication of the reboot.  As a result, you might want to start a continuous ping to the ILO to see once it has rebooted and is back up ***

show /map1/firmware1
cd /map1/firmware1
load -source http://http_server_ip/ilox_xxx.bin

Once the ILO reboots, you should have a working ILO with the firmware version you want / need.

As always – Use any tips, tricks, or scripts I post at your own risk.

2016.05.12 - 19.34.52 - SNAGIT -  0097

HOWTO: Monitor the rebuild status of a HPE SmartArray in ESXi 5.5

To monitor the rebuild status of a HP SmartArray controller in VMware ESXi 5.5, you need to have the HP VMware tools bundle installed (which is installed if the server was installed from the HP VMware media / ISO).  Once the tools bundle has been installed, simply SSH the server (or go right on the console, either physically or via ILO), login and run:

/opt/hp/hpssacli/bin/hpssacli ctrl all show status

This will provide you a list of all the SmartArray controllers in the server.  From this list, find the slow number of the controller that contains the logical drive you need to check the status on and run the following command (substitute slot=XX for the slot value you determined with the previous command):

/opt/hp/hpssacli/bin/hpssacli ctrl slot=XX ld all show

2016.04.14 - 09.12.11 - SNAGIT -  0000

If you happen to running an older version of ESXi 5.x, or your HP VMware Tools bundle is not somewhat recent, then the commands are somewhat different.  In this case the correct commands are:

/opt/hp/hpacucli/bin/hpacucli
ctrl all show
ctrl slot=0 ld all show