HOWTO: Install QLogic QConvergedConsole beside HPE 3Par SSMC and Veeam Backup & Recovery

Ok – I’ll admit it – I’m something of a vendor snob…  And my vendor of choice when it comes to Ethernet and fibre channel host connectivity is QLogic and HPE’s OEM products made by QLogic.  You just can’t beat the price or performance of the offerings, and the support that QLogic’s HPE OEM team gives you – they are second to none (a huge shout out to @ToddOwens_QLGC & Jim Burton – if you guys are reading this, thanks for all the amazing support over the years!).

One of the interesting things about QLogic is their branded applications generally work hand in hand with the OEM products they offer to various system manufacturers such as HPE, Dell, and Lenovo.  While I was attending a storage conference last week, I sat in on a presentation Jim and Todd were hosting.  During the presentation the talk turned to QLogic’s comprehensive adapter management tools, including the Web-based QCC (QConvergedConsole), which is supported on Windows, Linux, and Solaris.  QCC allows you to modify and configure your adapters (Ethernet, iSCSI, FCoE, and FC), upgrade the flash on them, perform FC ping and traceroute, and to view reports, statistics, and diagnostics of all the QLogic devices in your equipment – either locally or remote.

Given that QLogic devices are generally so bullet proof, and that the HPE Support Pack for Proliant takes care of my firmware updates, I rarely have a need to install and use QCC.  But today was a little different – I had a VMware host that suffered a Purple Screen of Death overnight, and while I was in the ILO power cycling it and looking for a reason for PSOD, I noticed that ILO was complaining that the 534FLR-SFP+ adapter was degraded because it was in FCoE mode and not connected (we don’t use FCoE).  Since I didn’t want to waste any more time playing around with the host before I brought it back online, I decided that I would load QCC on my management server at the site and see if I could disable FCoE mode remotely.

I never did find a way to disable the FCoE function via QCC – I only spent 3 minutes looking at it, so there may well be a way if I actually RTFM (that isn’t my style though), but this post isn’t about that.  This post is all about getting QCC to co-exist (temporarily anyways) on a server that already has HPE’s 3Par SSMC and / or Veeam Backup & Recovery installed on it.  QCC has been around a long time – longer that both SSMC and VBR, and as such has a few port conflicts that the guys at HPE and Veeam never took into consideration.  As a result, you can’t just fire up the QCC installer and expect it co-exist and run 7/24 right out of the box along side SSMC and VBR.

Once you have the QCC installer downloaded and extracted, there are a few things we need to do before firing up the installer.

First, lets check to make sure TCP ports 8080, 8443, and 111 are not in use.  We can accomplish this by opening an elevated command prompt and running:   netstat -ano | find “0.0.0.0:####”

2017-01-30-15-04-41-snagit-0067

In the example above, you can see that two of the three ports are in use.  Port 8443 is used by the application that has a PID of 38692, while port 111 is used by the application that has a PID of 30000.  Using Task Manager, or better yet my favorite tool for the job – Process Explorer, we can easily determine the applications that are hogging these ports if we enable the PID and Path columns and then sort of the PID.

2017-01-30-15-08-27-snagit-0068

So to get started, we need to stop (temporarily) SSMC and VBR’s vPower NFS service.

2017-01-30-13-56-34-snagit-0040

2017-01-30-13-57-00-snagit-0041

Now that we have stopped these two services, lets double check to make sure TCP ports 8080, 8443, and 111 are no longer in use.

2017-01-30-15-13-24-snagit-0069

So with all three ports now free and no longer in use, we can launch the QCC installer as Administrator (note – all screen snapshots are based on Windows 2012 R2 with QCC v5.4.0.41).  Click next a couple of times until you get to the “Please enter desired port number”.  This defaults to TCP 8080, which as we checked already above, is free to use, so go ahead and click Install.

2017-01-30-13-58-32-snagit-0046

Eventually the installer will prompt if you wish to restrict access to localhost.  No one else at my sites require access to QCC, so I’m ok with restricting access – I clicked yes (note it defaults to no, so if you just hit enter you answered no…)

2017-01-30-13-59-39-snagit-0048

Eventually, you’ll be prompted if you wish to enable security login.

2017-01-30-14-00-02-snagit-0050

Since this application is only going to be enabled temporarily when I actually need it on the management server on the management VLAN, and because I am restricting access to the localhost only, I left the checkbox cleared.  That said, you may wish enable security – and if you do, make sure you make a note of the credentials you set!  The default login id credentials if you didn’t change them is “QCC” with a password of “config”.  Click Next to continue.

Now you are prompted if you wish to enable SSL.  That is likely a good idea, even if you are restricting it to the localhost – so click yes.  This will automatically set the Tomcat7 engine to use TCP 8443 and you can not change this from the installer.

2017-01-30-14-00-21-snagit-0051

Finally you will be presented with the Done button.

2017-01-30-14-02-22-snagit-0052

Take note of the URL as you will need it shortly…  https://localhost:8443/QConvergeConsole/ or http://localhost:8080/QConvergeConsole/

Now we can go ahead and install the necessary management agents.  In my case I am going to install all of the management agents.

2017-01-30-14-02-55-snagit-0054

After we click Next, you’ll notice that the installer is installing the ONCPortmap service.  This runs on TCP 111.  If TCP 111 is already in use, the installer will hang, and hang, and hang…  This is why we stopped the Veeam vPower NFS service earlier.

2017-01-30-14-03-14-snagit-0055

Eventually the management agent will complete the install process.  When we install the next management agent, you’ll notice a warning about the ONCPortmap service – this is good!  It means the ONCPortmap installed and started successfully.

2017-01-30-14-04-05-snagit-0058

After we have all the management agents installed that we want or require, we can go back in the command prompt and check our port status again.

2017-01-30-14-08-10-snagit-0059

Now you can see that all three ports are in use – which means QCC is likely ready to go.  Sort of…  As I mentioned previously, because the ports used in QCC conflict with SSMC and Veeam vPower NFS, we can’t just leave things alone and expect all three apps to work in the future after a reboot.  In my environment SSMC and Veeam are more important than QCC, and I always want them to be started after a server reboot.  So we need to set the follow services to be manual start instead of automatic (which they are by default) so they don’t prevent SSMC or Veeam from starting.

  • ONC/RPC Portmapper
  • QLManagementAgentJava
  • QLogic Management Suite FastLinQ
  • QLogic Management Suite Java iQAgent
  • QLogicManagementSuitenQLRemote
  • Tomcat7

Once we have changed the startup type of these services to manual, then lets login using the URL we were shown above.

2017-01-30-14-08-58-snagit-0060

Now – in the Host Selection dialog box, type in localhost and hit the connect button.  You should be able to safely ignore any errors you may see.

2017-01-30-14-09-18-snagit-0061

Finally – the console is opened!  Lets make a simple cosmetic change to see if it works (so something that does not affects the performance or anything of the adapters).  Highlight one of the ports of one of your adapters (in my example below, Port 0 of the HP 533FLR-T) and click on the MBA Boot Cfg tab in the right hand pane.  In the Hide Setup Prompt drop-down box, pick the opposite of whatever is there (it is probably already disabled, so select enabled), then click the Apply button.

2017-01-30-14-11-55-snagit-0063

You’ll be prompted for a password.  This password, assuming you made no changes to the default setup will be “config”.  If you aren’t sure if this is correct, clear the checkbox that says save password.  If you leave it checked, and the password you put in is wrong, then you will need to log out of QCC and back in to be able to try a different password.

2017-01-30-14-12-14-snagit-0064

If you had the correct password, you’ll see a green banner advising you of a successful update!

2017-01-30-14-12-31-snagit-0065

Now all that is left is to make the changes you actually set out to do!  Of course, once you are finished, you have two choices – reboot the server to apply the changes and have SSMC and VBR startup on reboot, or ignore the reboot, manually stop all the QCC services (see the list above) and manually start the SSMC and VBR services.

Now that you have QCC installed, if you need to access it in the future, you can just stop SSMC and VBR, then start the necessary QCC services.  While it isn’t a perfect solution, it will allow QCC to coexist along side both SSMC and Veeam’s vPower NFS service.

As always – Use any tips, tricks, or scripts I post at your own risk.

HOWTO: Replace a failed 3Par drive

HPE 3Pars are great arrays, but just like any other storage system, they do occasionally end up suffering a failed hard drive.  Replacing a failed 3Par drive isn’t quite the same as replacing a failed Proliant Smart Array controller drive – there are a few manual steps that need done to facilitate the replacement process, which I am going to detail below (note – I’m using a StoreServ 7200, based on OS 3.2.1 MU2 as my reference in this post).

First, SSH (via Putty) the 3PAR’s management IP and login as 3paradm (remember the username and password are case-sensitive).

At the 3PAR_SN# cli% prompt, type:    showpd -failed -degraded

This should show you the failed drive and it’s ID (in the example below, the drive hasn’t totally failed, but rather is just degraded due to an internal loop error in the drive, so it needs replaced).

2016.05.24 - 09.15.46 - SNAGIT -  0026

Next, see if servicemag has been issued or is running with:   servicemag status

If servicemag is not running, you will see:   No servicemag operations logged.

Now we want to see if the data has been evacuated off the drive already by running this command:   showpd -space 15   (where 15 is the drive ID that needs replaced).   Using the output shown below, double check there is no data left on the drive. You need to check that all columns other than size and failed are zero.  As you can see from the example , this drive still has data on it (again because the drive in this example is only degraded, not failed – my experience is that typically failed drives have 0, 0, 0, 0 for volume, spare, free, and unavailable, while failed is usually equal to the size).

2016.05.24 - 09.15.54 - SNAGIT -  0027

To evacuate the data, run this command:    servicemag start -pdid 15     and answer yes when prompted if you are sure you want to run it.

2016.05.24 - 09.56.05 - SNAGIT -  0033

To check the status / progress of the servicemag command, run:    servicemag status

2016.05.24 - 09.16.14 - SNAGIT -  0029

As you can see above, 4 chunklets (1GB blocks of disk space) have been moved off the drive so far, with another 107 chunklets (107 GB) to evacuate.  Below is what you will see once the servicemag process has finished.

2016.05.24 - 09.16.23 - SNAGIT -  0030

Before continuing, verify there is no data left on the drive by running:  showpd -space 15

2016.05.24 - 09.16.28 - SNAGIT -  0031

When the HPE field engineer arrives onsite with the replacement disk, you may need to turn on the locate light on the failed drive for him.  To do this, run:      locatecage -t XX cageY ZZ    where TT is time in seconds (i.e. 300), and Y in cageY is the cage number shown above, and ZZ is the magazine number to locate (i.e.  locatecage -t 300 cage0 15 enables the flashing locate light for 5 minutes for the failed drive that is being referenced in this HOWTO).

Once the drive has been replaced, the 3Par **should in the background** run an admitpd automatically for you.  To verify this, run:   showpd -p -mg ZZ -c Y     to see if the new drive is listed (note it will most likely have different drive ID than the dead drive)

When you have verified the new drive has been seen and admitted, you can check the rebuild status with servicemag statusYou can see below the rebuild process, followed by the status message once servicemag as successfully finished.

2016.05.24 - 09.16.52 - SNAGIT -  0032

If you go back to the HP 3PAR Management Console and refresh the console, you should find the fail drive no longer appears (it will stay there appearing as failed even after it has been removed from the cage until the rebuild process is completed, at which point it will go away).

If the HP 3PAR Management Console indicates a firmware update needs performed on the replacement drive, run:   upgradepd ZZ    and answer yes when prompted.  Refresh the HP 3PAR Management Console when the upgrade is complete to check for any other errors.

If no further errors appear, the drive replacement process is completed.  If there are errors, then escalate back to HPE with your original case number.

As always – Use any tips, tricks, or scripts I post at your own risk.

HOWTO: Turn on a HDD UID on a HPE Proliant in VMware with HPSSACLI

This morning we needed to replace a hard drive in a HPE Proliant running VMware ESXi at a remote site that had a PFA on it.  Unfortunately, while ILO is great at identifying the defective drive, it has no ability to enable the UID on the drive, and given that this unit is at a remote site, we had no way of knowing in advanced if the fault light was actually turn on for this drive before the HPE field engineering arrived to swap the drive.  So after digging through the help documentation, I found the necessary HPSSACLI command to enable the drive’s UID.

First, to get a list of all the physical drives in an ESXi host, SSH the host and run this command:

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 physicaldrive all show

This should output a list of all the drives in the system as shown below.

2016.05.19 - 10.14.13 - SNAGIT -  0005

Next, to enable the blue UID LED for 1 hour on port 2I, box 1, bay 2, run this command:

/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 physicaldrive 2I:1:8 modify led=on duration=3600

The blue UID should now come on for 1 hour and then shut off on it’s own.  If you want want to manually shut if off before the 1 hour is up, run the same command again, but change the “led=on” to “led=off”.

As always – Use any tips, tricks, or scripts I post at your own risk.

Upgrade a stuck ILO firmware via SSH

We have had a rash of issues where by upgrading ILO firmware via the WebUI has been failing.  It looks like it finishes, but when you log back in, it is still the original firmware from when you started the upgraded.  And no matter what you do via the WebUI, it just will not upgrade.  So to upgrade the stubborn firmware, the simplest thing to do is SSH the ILO directly and upload the firmware via the console interface.  Below are the steps to do this.

First, you need a running web server to pull the firmware from.  IIS is usually the handiest, so it is simply a matter of adding a mime-type for the binary firmware file.  Open an administrative command prompt and run:

c:\windows\system32\inetsrv\appcmd.exe set config /section:staticContent /+"[fileExtension='.bin',mimeType='application/x-bin']"
iisreset /restart

Extract the ILO firmware bin with 7-Zip and put the bin somewhere within IIS that you can download it.   Next – to save myself extra grief, I also make sure I can actually download the firmware to a regular PC with a browser before continuing.  So open the browser of your choice and make sure you can download the bin to your PC before continuing.

Putty the ILO interface, accepting the SSH key (if prompted), and login.  Once logged in, check, then download the new firmware with the following commands.

*** Note – the ILO will automatically reboot once it successfully downloads the firmware and does not give any indication of the reboot.  As a result, you might want to start a continuous ping to the ILO to see once it has rebooted and is back up ***

show /map1/firmware1
cd /map1/firmware1
load -source http://http_server_ip/ilox_xxx.bin

Once the ILO reboots, you should have a working ILO with the firmware version you want / need.

As always – Use any tips, tricks, or scripts I post at your own risk.

2016.05.12 - 19.34.52 - SNAGIT -  0097

Factory Reset a HPE FlexFabric 5700 to defaults

Not to long ago, we received a new HPE FlexFabric 5700 switch and we proceeded to muck around with the configuration settings trying a few things that we normally would never do to a production switch.  When we were done having fun and learning, we needed to reset the unit back to defaults so we could really deploy it into production.  Of course, resetting a switch to factory defaults is not something you do very often, so we had to actually RTFM.  I’ll save you the time of that here…

From the serial console, execute these commands:

restore factory-default
yes
save
yes
{hit enter}
reboot

When the switch reboots, it will be at defaults.

Below is a screen snapshot of what you’ll see during this process.

2016.05.11 - 14.43.21 - SNAGIT -  0066

HOWTO: Monitor the rebuild status of a HPE SmartArray in ESXi 5.5

To monitor the rebuild status of a HP SmartArray controller in VMware ESXi 5.5, you need to have the HP VMware tools bundle installed (which is installed if the server was installed from the HP VMware media / ISO).  Once the tools bundle has been installed, simply SSH the server (or go right on the console, either physically or via ILO), login and run:

/opt/hp/hpssacli/bin/hpssacli ctrl all show status

This will provide you a list of all the SmartArray controllers in the server.  From this list, find the slow number of the controller that contains the logical drive you need to check the status on and run the following command (substitute slot=XX for the slot value you determined with the previous command):

/opt/hp/hpssacli/bin/hpssacli ctrl slot=XX ld all show

2016.04.14 - 09.12.11 - SNAGIT -  0000

If you happen to running an older version of ESXi 5.x, or your HP VMware Tools bundle is not somewhat recent, then the commands are somewhat different.  In this case the correct commands are:

/opt/hp/hpacucli/bin/hpacucli
ctrl all show
ctrl slot=0 ld all show

HPE Insight Remote Support 7.6 auto-upgrade fails

As some of you may have noticed, HPE rehpe_pri_grn_pos_rgbleased Insight Remote Support (IRS) version 7.6 this week.  Among other things, the interface is now rebranded with the new HPE logo and icon, it has better security logging, and add support for a bunch of new HPE Networking and HPE StoreEasy products.

If you have already set the “Automatic Update Level” in IRS to “Automatically Download and Install”, you may already have 7.6 successfully deployed to your server.  It’ll be pretty obvious to tell too – if you see the HPE logo shown above on the login page or as the desktop shortcut icon, you are already at version 7.6.

For some reason however, a couple of my IRS 7.5 servers have failed to auto-update to 7.6.  Trying to install the 7.6 update from the Software Tab in IRS by clicking the Start Update also fails.  Normally at this point, I’d simply go to the Software Depot, download 7.6 and manually run the setup – except that 7.6 isn’t available in the Software Depot as the Software Depot download page generates an error message as of this writing (2016.04.02).

So – after some troubleshooting and poking around the log files, I determined you can download the 7.6 package update from the same spot that IRS downloads it:

https://services.isee.hp.com/SWM/packages/ProdUpgPkg/2016-03-31T154720/ProdUpgPkg+7.6.0.27.zip

Unzip this archive to C:\TEMP and then from a command prompt run:

msiexec /i "C:\TEMP\ProdUpgPkg+7.6.0.27\lib\hprs7kit.msi" /lv "%HP_RS_LOG%\hprs_7.6.0_install.log"

Now – if your servers were like those same servers I have, this will fail too.  Taking a look at “%HP_RS_LOG%\hprs_7.6.0_install.log“, you’ll find that pg_dumpall.exe couldn’t connect to the database as the connection was refused.  This results in database.sql being missing, which causes the install to puke with an error code of 1603.  database.sql is the Postgres database dump of your production IRS database that the installer attempts to make.  Now just above the 1603 error in “%HP_RS_LOG%\hprs_7.6.0_install.log“, you’ll find the actual command line for pg_dumpall.exe, which should be (depending on the vintage of your original IRS install) either:

"C:\Program Files\HP\RS\postgresql_9_win32\bin\pg_dumpall.exe" --host=localhost --port=7950 --username=postgres --file="C:\ProgramData\HP\RS\DATA\PERSISTENCE\UPGRADE\database.sql"
-- or --
"C:\Program Files (x86)\HP\RS\postgresql_9_win32\bin\pg_dumpall.exe" --host=localhost --port=7950 --username=postgres --file="C:\ProgramData\HP\RS\DATA\PERSISTENCE\UPGRADE\database.sql"

Manually running the appropriate version command line from above will result in you being prompted for the postgres user password 6 times.  Unfortunately, this password is undocumented, but by doing some detective work (I won’t be sharing how I found what it was), I’ve determined it to be “edit – removed 2016.04.05 as per a request from HPE“.  So enter this password when prompted each of those 6 times, and you’ll find C:\ProgramData\HP\RS\DATA\PERSISTENCE\UPGRADE\database.sql is created.  Now you can go back and run the installer again from the command prompt:

msiexec /i "C:\TEMP\ProdUpgPkg+7.6.0.27\lib\hprs7kit.msi" /lv "%HP_RS_LOG%\hprs_7.6.0_install.log"

Your upgrade should now complete successfully, and all that is left is to log into IRS, go to the Software Tab and check for updates, and install any remaining updates.

As always – Use any tips, tricks, or scripts I post at your own risk.

 

Setup hourly HPE Insight Remote Support Service checking

In a previous post, I mentioned we utilize HPE Insight Remote Support (IRS) at all our client sites, and discovered the lovely undocumented “feature” that IRS has, which is a tendency not to start after a Windows server reboot after an IRS update. This great undocumented feature defeats the entire purpose of IRS – monitoring and alerting your HPE hardware. After getting burned by this feature three or four times in a month where customers noticed hardware faults (via amber alert lights on the equipment) before we did since IRS was not running to alert us, I decided it was time to write a script to check IRS hourly and alert us if it wasn’t running.

To configure Windows to send an alert if the HP IRS Service is stopped, create the following two files (file contents are at the end of this post) on the IRS server:

  • check_irs_service_status.cmd – which is the wrapper that will call PowerShell from Task Scheduler
  • check_irs_service_status.ps1 – which is the actual PowerShell script that executes the service status check

Lastly, we need to schedule check_irs_service_status.cmd to run hourly. I’ve set 2 minutes after the hour in the example shown below, but you can adjust as required.

schtasks /create /tn "Hourly IRS Service Check" /tr c:\Windows\check_irs_service_status.cmd /sc minute /mo 60 /st 00:02:00 /rp "*" /ru "%userdomain%\%username%"

By default, the SMTP from address will be the netbios computer name of the IRS server @ the User’s DNS Domain FQDN (i.e. IRS-SERVER@JBGEEK.NET).  The SMTP to address will be support @ the User’s DNS Domain FQDN (i.e. SUPPORT@JBGEEK.NET), and the SMTP server will be mail @ the User’s DNS Domain FQDN (i.e. MAIL.JBGEEK.NET).  You can determine what these will be by checking the system’s environment variables with SET from a command prompt.  You can customize these settings in the “Send-MailMessage” command if necessary.

All that is left to do is to stop the service and test run check_irs_service_status.cmd to verify the Send-MailMessage works properly in your environment.

 

check_irs_service_status.cmd

rem --- begin cut and paste of notepad c:\windows\check_irs_service_status.cmd
@echo off
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -ExecutionPolicy RemoteSigned -noprofile -File C:\Windows\check_irs_service_status.ps1
exit /b
rem --- end cut and paste of c:\windows\check_irs_service_status.cmd ---

 

check_irs_service_status.ps1

###--- begin cut and paste of notepad c:\windows\check_irs_service_status.ps1
### Check_irs_service_status.ps1
### @deancolpitts – http://blog.jbgeek.net
### 2016.01.27
### This script will check the status of the server HPRSMAIN and alert via email if the service is stopped.

$Service = Get-Service -name HPRSMAIN
$Service.Status
if ($Service.Status -eq "Stopped") {
 $CurrentTime = Get-Date
 Send-MailMessage -From "$env:computername@$env:userdnsdomain" -To "support@$env:userdnsdomain" -Subject "$env:computername - HP IRS Service is stopped!!!" -Body "The HP IRS Service is stopped on $env:computername.$env:userdnsdomain at approximately $CurrentTime." -Priority High -DNO onSuccess, onFailure -SmtpServer "mail.$env:userdnsdomain"
}

###--- end cut and paste of notepad c:\windows\check_irs_service_status.ps1

 

HPE Insight Remote Support fails to start after reboot

We utilize HPE Insight Remote Support (IRS) at all our client sites, and typically have it running on either Windows 2008 R2 or Windows 2012 R2.  To simplify administration, we typically enable auto-update of IRS, which means IRS will download updates from HPE as they become available and self-update.  One of the lovely “features” that we discovered is that upon the next Windows server reboot after an IRS update (typically at 3am on the first Wednesday after the 2nd Tuesday of every month – thanks Microsoft), the HPRSMain service fails to start.  No amount poking, prodding or swearing will convince the service to start either.

The solution is to run a repair – except the HPE team doesn’t make that easy either as the only option in Add/Remove programs is to uninstall.  Fortunately, you should find the .msi for IRS in C:\ProgramData\HP\RS\DATA\SWM\LANDINGZONE\ProdUpgPkg\unzipped\lib.

So the quickest way to fix IRS at this point is to open an Administrative Command Prompt and run:

msiexec /f "C:\ProgramData\HP\RS\DATA\SWM\LANDINGZONE\ProdUpgPkg\unzipped\lib\hprs7kit.msi" /lv "%HP_RS_LOG%\hprs_recovery.log"

After a few minutes, the HPRSMain service should start and good until at least the next IRS update.

Windows 2012 R2 is unable to connect to HP StoreOnce CIFS shares

I ran into this issue the other day with a new HPE StoreOnce deployment. When attempting to connect to a CIFS share on a StoreOnce appliance (it does matter if it is physical appliance or VSA) from a Windows 2012 R2 server, the following error is received “The account is not authorized to log in from this station”.

1

To fix this, navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters and set RequireSecuritySignature = 0  and then reboot Windows.  When Windows comes back up, you should now be able to browse the CIFS share on the StoreOnce appliance.

2