Sunday, May 6, 2007

Typical Jason work

Sometimes I have a hard time explaining what I do. So while I wait for Jessica to play Dominoes with me in a few minutes, I'll explain something I just did for one of our larger clients. All without leaving the comfort of our apartment.

Any amateur computer geek knows that hard drives are what hold the data on your computer. If your hard drive dies, well, you're out of luck, unless you've been backing up. On a higher level, servers aren't much different. They usually have bigger hard drives, but the principle is similar. The only difference is, a dead hard drive could mean bad news for a business.

One of our larger clients (100+ users) has a mail server with three drives in it. A few months ago, I happened to notice that one of the drives wasn't showing any activity. By that, I mean that it wasn't blinking any pretty Christmas lights on the front of it in the server room. Two were, one wasn't.

The purpose of having three drives is a tricky concept called RAID. RAID stands for "redundant array of independent disks." There are lots of RAID configurations, for which I usually have to refer to this Wikipedia article, but the bottom line is that RAID allows you to have your data mirrored across multiple disks, so that if one dies, no data is lost. In fact, the server continues to run in most cases. Nice feature, eh?

Anyways, one of the three drives had failed, and the server was now running on two. Our previous IT consultant had apparently discovered this little nugget of information, and had already ordered a replacement drive from HP, which was sitting in the server room. I popped it into the server, but to my surprise, the array did not rebuild. They usually rebuild automatically. In this case, no blinking Christmas lights.

Well, after some more investigation, I discovered that the array was configured for RAID 1+0. Confusing, I know. I had to look it up myself. The core concept of 1+0 means you need four hard drives to do it. This server had only three, and one was dead. The mystery continues.

After much debate with some fellow engineers, we decided to migrate the entire array to RAID 5. Now, if you have a technical background, you might object here: there are pros and cons to using RAID 5 on an Exchange server. The bottom line is, it's only a medium-sized business, and we just wanted to wash our hands of responsibility as soon as possible to get the server back into a truly redundant state. When it comes to IT, I get nervous when one net has broken and there's nothing left to catch you before hitting the ground.

So this morning, I started a full backup of the mail server using Symantec Ghost. We actually already back the server up on a daily basis using Symantec Backup Exec. However, I've never had to restore a server from scratch using Backup Exec. Ghost, however, allows you to take a complete snapshot of the server and compress it into one file. A big file, perhaps, but I knew it would be the best way to recover from disaster if this RAID change went bad.

I had already cleared it with the client that there would be a mail outage Sunday, but I didn't specify an exact time. Hopefully nobody missed their Email, because it took 10 hours to complete the Ghost backup. I didn't think it would take that long.

At 9:00, the backup was complete, so I made the migration from RAID 1+0 (which, remember again, was setup improperly to begin with) to RAID 5. The switch was painless, did not even require a reboot, and at this point the array is rebuilding happily. I restarted the Exchange services and tested mail access, and all seems to be well. I'll check again tomorrow morning before leaving for the day to see if the rebuild is complete. Even if it isn't, the only noticeable side effect should be some minor slowness for people using Outlook. A small price to pay to know that their data is now redundant again in case of a disk failure.

So there you have it, that's a sample of some of the more technical work that I do. I'd say 50% of my job is this and the other 50% is managing client expectations and planning strategies with our office.


Arthur said...

It's like a mystery novel you can't put down. Seriously you do a very good explanation. Congratulations on your new abode.

your loyal fans

Liz said...

Arthur reminded me to check your blog -something about doing laundry. When I did I saw I had missed this one, too. Great account of the the work you do and a good example why IT people can be blamed for things going wrong and probably don't get the credit they deserve for things going right!