For as little discussion as possible, I'm writing these words for only those interested in helping protect themselves from these horrific events.
This story is not suitable for individuals under the age of 30. You've been warned.
We were without internet for around 29 hours. After switching our DSL ISP from MSN to Qwest.net and an hour phone call, I finally get told that I need a new modem late last night. Funny how they never told me that during the call to switch ISP's. They told me to call the billing department so I woke up much earlier than I usually do to call sharply at 8 AM. But then it was sounding like we would have to order a modem and that would mean Monday or Tuesday. If at all possible, that would not work. Lots of our stuff (email) is hosted here so I couldn't go that long. Luckily, Best Buy had the modem we needed and I was able to pick one up only minutes after they opened this morning.
One final call to get info on setting it up and we were finally back up! Yes, email can start coming in and I'm all done! Or can it?
So I open my email and see three Promise Controller Messages. Yes, it said the raid array was in a Critical state! First, that means that TWO drives have failed. Second, I go to the server and it only shows Degrade, which is much less concerning. So I immediately obtain my external backup drives to start updating them. Stupidly, I've been meaning to update them for over a week but kept forgetting! So here I am with a few hundred gigabytes that need emergency backups. I get every computer going copying anything I can fit and wait. Unfortunately, I'm two hours away from having to be to class and they are estimating 3+ hours. I let them run until the last minute.
So I finally decide I have to stop and try to get it rebuilding. I find a drive...one that failed a month ago but works fine in an external enclosure (and I don't have a brand new one). I get it ready and stick it in the empty drive slot. All the drives immediately shut down. Every backup around the house stops and I get about 25 messages through Promise WebPam telling me that each drive was unplugged and the appropriate status it took the array to--which was well beyond Offline. Then the drives start turning back on and it's in Critical state. It starts rebuilding drive 4 and says the drive I added is offline.
It's about 25% done rebuilding now and luckily it goes much faster for RAID5 rebuilding than the second parity. When it's done, I'll try to get the 16th drive going and hopefully everything will be back to normal.
Whatever they call hot-swap, this ain't. Last time I tried hot swapping it I had the same problem. It's really about as annoying as it gets. Honestly, I don't think my next card will be a Promise. As solid as it has been for 9 months, the seemingly continuous and unnecessary drive "failing" plus this hot swappable issue destroy the credibility. So far I haven't lost anything but when I plugged the drive in, the drive was immediately lost on the server. Luckily it came back. I'm not a fan of luckily.
The only way I can avoid this issue is to shut the server off, take out and replace the drive, then boot up. It will then start rebuilding. But isn't the whole point of hot-swap so you don't have to do that?! This is crazy!
Oh yea, and the email wasn't going through because the new modem is sort of a router and I had to DMZ smoothwall.
So several compounding issues all on the same horrific day or two.