HOWTO: Replace a failed 3Par drive

HPE 3Pars are great arrays, but just like any other storage system, they do occasionally end up suffering a failed hard drive.  Replacing a failed 3Par drive isn’t quite the same as replacing a failed Proliant Smart Array controller drive – there are a few manual steps that need done to facilitate the replacement process, which I am going to detail below (note – I’m using a StoreServ 7200, based on OS 3.2.1 MU2 as my reference in this post).

First, SSH (via Putty) the 3PAR’s management IP and login as 3paradm (remember the username and password are case-sensitive).

At the 3PAR_SN# cli% prompt, type:    showpd -failed -degraded

This should show you the failed drive and it’s ID (in the example below, the drive hasn’t totally failed, but rather is just degraded due to an internal loop error in the drive, so it needs replaced).

2016.05.24 - 09.15.46 - SNAGIT -  0026

Next, see if servicemag has been issued or is running with:   servicemag status

If servicemag is not running, you will see:   No servicemag operations logged.

Now we want to see if the data has been evacuated off the drive already by running this command:   showpd -space 15   (where 15 is the drive ID that needs replaced).   Using the output shown below, double check there is no data left on the drive. You need to check that all columns other than size and failed are zero.  As you can see from the example , this drive still has data on it (again because the drive in this example is only degraded, not failed – my experience is that typically failed drives have 0, 0, 0, 0 for volume, spare, free, and unavailable, while failed is usually equal to the size).

2016.05.24 - 09.15.54 - SNAGIT -  0027

To evacuate the data, run this command:    servicemag start -pdid 15     and answer yes when prompted if you are sure you want to run it.

2016.05.24 - 09.56.05 - SNAGIT -  0033

To check the status / progress of the servicemag command, run:    servicemag status

2016.05.24 - 09.16.14 - SNAGIT -  0029

As you can see above, 4 chunklets (1GB blocks of disk space) have been moved off the drive so far, with another 107 chunklets (107 GB) to evacuate.  Below is what you will see once the servicemag process has finished.

2016.05.24 - 09.16.23 - SNAGIT -  0030

Before continuing, verify there is no data left on the drive by running:  showpd -space 15

2016.05.24 - 09.16.28 - SNAGIT -  0031

When the HPE field engineer arrives onsite with the replacement disk, you may need to turn on the locate light on the failed drive for him.  To do this, run:      locatecage -t XX cageY ZZ    where TT is time in seconds (i.e. 300), and Y in cageY is the cage number shown above, and ZZ is the magazine number to locate (i.e.  locatecage -t 300 cage0 15 enables the flashing locate light for 5 minutes for the failed drive that is being referenced in this HOWTO).

Once the drive has been replaced, the 3Par **should in the background** run an admitpd automatically for you.  To verify this, run:   showpd -p -mg ZZ -c Y     to see if the new drive is listed (note it will most likely have different drive ID than the dead drive)

When you have verified the new drive has been seen and admitted, you can check the rebuild status with servicemag statusYou can see below the rebuild process, followed by the status message once servicemag as successfully finished.

2016.05.24 - 09.16.52 - SNAGIT -  0032

If you go back to the HP 3PAR Management Console and refresh the console, you should find the fail drive no longer appears (it will stay there appearing as failed even after it has been removed from the cage until the rebuild process is completed, at which point it will go away).

If the HP 3PAR Management Console indicates a firmware update needs performed on the replacement drive, run:   upgradepd ZZ    and answer yes when prompted.  Refresh the HP 3PAR Management Console when the upgrade is complete to check for any other errors.

If no further errors appear, the drive replacement process is completed.  If there are errors, then escalate back to HPE with your original case number.

As always – Use any tips, tricks, or scripts I post at your own risk.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s