HOWTO: Replace a failed 3Par drive

HPE 3Pars are great arrays, but just like any other storage system, they do occasionally end up suffering a failed hard drive.  Replacing a failed 3Par drive isn’t quite the same as replacing a failed Proliant Smart Array controller drive – there are a few manual steps that need done to facilitate the replacement process, which I am going to detail below (note – I’m using a StoreServ 7200, based on OS 3.2.1 MU2 as my reference in this post).

First, SSH (via Putty) the 3PAR’s management IP and login as 3paradm (remember the username and password are case-sensitive).

At the 3PAR_SN# cli% prompt, type:    showpd -failed -degraded

This should show you the failed drive and it’s ID (in the example below, the drive hasn’t totally failed, but rather is just degraded due to an internal loop error in the drive, so it needs replaced).

2016.05.24 - 09.15.46 - SNAGIT -  0026

Next, see if servicemag has been issued or is running with:   servicemag status

If servicemag is not running, you will see:   No servicemag operations logged.

Now we want to see if the data has been evacuated off the drive already by running this command:   showpd -space 15   (where 15 is the drive ID that needs replaced).   Using the output shown below, double check there is no data left on the drive. You need to check that all columns other than size and failed are zero.  As you can see from the example , this drive still has data on it (again because the drive in this example is only degraded, not failed – my experience is that typically failed drives have 0, 0, 0, 0 for volume, spare, free, and unavailable, while failed is usually equal to the size).

2016.05.24 - 09.15.54 - SNAGIT -  0027

To evacuate the data, run this command:    servicemag start -pdid 15     and answer yes when prompted if you are sure you want to run it.

2016.05.24 - 09.56.05 - SNAGIT -  0033

To check the status / progress of the servicemag command, run:    servicemag status

2016.05.24 - 09.16.14 - SNAGIT -  0029

As you can see above, 4 chunklets (1GB blocks of disk space) have been moved off the drive so far, with another 107 chunklets (107 GB) to evacuate.  Below is what you will see once the servicemag process has finished.

2016.05.24 - 09.16.23 - SNAGIT -  0030

Before continuing, verify there is no data left on the drive by running:  showpd -space 15

2016.05.24 - 09.16.28 - SNAGIT -  0031

When the HPE field engineer arrives onsite with the replacement disk, you may need to turn on the locate light on the failed drive for him.  To do this, run:      locatecage -t XX cageY ZZ    where TT is time in seconds (i.e. 300), and Y in cageY is the cage number shown above, and ZZ is the magazine number to locate (i.e.  locatecage -t 300 cage0 15 enables the flashing locate light for 5 minutes for the failed drive that is being referenced in this HOWTO).

Once the drive has been replaced, the 3Par **should in the background** run an admitpd automatically for you.  To verify this, run:   showpd -p -mg ZZ -c Y     to see if the new drive is listed (note it will most likely have different drive ID than the dead drive)

When you have verified the new drive has been seen and admitted, you can check the rebuild status with servicemag statusYou can see below the rebuild process, followed by the status message once servicemag as successfully finished.

2016.05.24 - 09.16.52 - SNAGIT -  0032

If you go back to the HP 3PAR Management Console and refresh the console, you should find the fail drive no longer appears (it will stay there appearing as failed even after it has been removed from the cage until the rebuild process is completed, at which point it will go away).

If the HP 3PAR Management Console indicates a firmware update needs performed on the replacement drive, run:   upgradepd ZZ    and answer yes when prompted.  Refresh the HP 3PAR Management Console when the upgrade is complete to check for any other errors.

If no further errors appear, the drive replacement process is completed.  If there are errors, then escalate back to HPE with your original case number.

As always – Use any tips, tricks, or scripts I post at your own risk.

HOWTO: Make a 3Par RJ45 serial console cable for an OpenGear ACM5004 Remote Site Manager

On all the 3Par StoreServ 7000/8000 nodes we manage for our clients, we utilize OpenGear ACM5004-F-E Remote Site Manager (basically a serial console switch) so that we can remotely access the serial console on the nodes (in case a node is shutdown to the SMI Whack prompt or something else silly is going on).  The ACM5004 has 4 x RJ45 serial ports on it, which are a Cisco Straight pinout.

Just plugging a standard patch cable into the ACM5004 and the back on the StoreServ node isn’t going to work.  And using the RJ45/DB9 couplers that the 3Par ships with is both a pain in the butt, an eye sore, and bulky.  So I set out to build my own custom RJ45 cable.

On the 3Par end, use pins 2 (orange), pin 3 (blue), and pin 5 (green). Be sure to label this end as the 3Par end!!! I attempt to always put a small tiewrap (which is white in the photo below and hard to see thanks to a blurry photo) around this end of the cable to mark it as the 3Par 3nd.

On the ACM5004 end, use pins 3 (blue), pin 5 (green), and pin 6 (orange). Be sure to label this end as ACM5004!!!

Plug the 3Par end into the node’s serial console port, and the other end into the ACM5004.  After configuring the ACM5004, SSH it and connect the appropriate port on the ACM5004 and you should now see your 3Par serial console.

3Par
3Par End
ACM5004
ACM5004 End