The thing

Back in August 2015, after my external drive failed and I lost about 8 months of photographs, I decided that I need some external storage with some redundancy, plus something to store all my shiznit, without resorting to other external drives that fail or get lost. So after talking with some friends, the best solution at the time was an HP MicroServer ProLiant Gen8 server. Powered by an Intel® Pentium® G2020T 2.50GHz processor and having 2GB of RAM, this should have been more than enough for my necessities. To store everything up, I bought two Western Digital Black 3TB, 7200rpm, 64MB, SATA 3 hard drives and one Samsung 850 EVO, 2.5", 120GB SSD drive to store the operating system. It’s a nifty little machine and it does all the stuff I need it to do. There are probably better home storage solutions at the moment, but this does it job just fine. My tor website is hosted on it, along a Plex Media Server and some other definitely not nefarious things. But sometimes…

The problem

Living in a recently finished building you will encounter issues you don’t see in regular housing, and electricity interruptions were pretty often and they used to mess up my server configuration. My home-grade UPS can last about 15 minutes, but often the interruptions were longer. After all, it’s just a glorified power filter with a chunky unefficient battery.

So, what happens? Well, it seems that if the server loses power for more than a few minutes, the HP Dynamic Smart Array loses all drive configuration and the system cannot boot, as it doesn’t find a working OS. I’ve searched for various fixes on this over the years, but the close I got to an answer was that it’s a known bug when the server has less than 4GB or RAM. THe issue still persists today and while I live in a different building now, power outages do happen from time to time.

The solution

Based on the old findings, I upgraded the 2GB of RAM that were shipped with the server to a 2x8GB configuration. All ECC server memory, going fine. The issue persists. However, the system is pretty simple, but involves a bit of preparing in advance, but these are steps you should already do when setting the server in the first place.

1. Set up Integrated Lights-Out (iLO)

Without any buzzwords, iLO basically is server-side software that allows you to control and manage your server from distance. You can access server controls (power, thermal, storage, etc). You can also use iLO as an “external monitor” to your devices, which is pretty cool, because you can control your server from your phone/tablet/desktop. I use my iPhone for this and it’s more than enough. You need to make sure that your iLO interface is connected to your home network and you could give it a static IP via MAC address reservation on your DHCP router. Inside the box of the server there is a piece of cardboard with some stickers with the iLO password.

2. Identify that this is the problem

Usually after a longer power outage, I wait for about 10 minutes to make sure there are no more power surges and I check my server’s IP (using ping or http on one of the services the server is running). If they’re down, then most likely the server hasn’t properly booted, so I fire my iLO iPhone app and check the server’s virtual screen. If it’s in a boot loop, then we need to fix it.

3. Fixing the issue

When the Dynamic Smart Array is busted, there’s no need to panic and wipe the drive (as I moronly did once), you just need to re-create the array. So let’s start by rebooting the server and when the Array tells me to Run Array Configuration Utility I press F5 (I made a mnemonic of “refresh the array”). Then you just let it boot to the HPE Smart Storage Administrator, a small tool built on top of a Linux small distribution. Here you have to choose the desired Array Controller (B120i in my case), and rebuild the array like in the steps below:

  • Create Arrays with RAID 0
  • Choose the SSD (we’re having a one-drive array, it still counts as an array)
  • Set Bootable Logical Drive/Volume
  • Tick the newly created Logical Drive 1 as both Primary and Secondary Boot Drive
  • Exit the application and restart the server

3. Double check the fix worked

Monitor the booting process and see if the system boots properly to the login screen.

Addendum

I could probably remove the entire RAID Array functionality and boot with legacy mode, but I’m too lazy to do and test that to make sure it works flawlessly. At least the method mentioned above works and I only have to do it several times a year. Also, I don’t know if the data on the SSD can be read outside of a RAID 0. The shortest path is the one you know.


This post is a part of Agora Road’s Travelogue for the month of January, an effort to promote blogging.