My Server Died :(
17 07 2006Last week we had some really bad thunderstorms in our area. This led to power interruptions everywhere, including my house. The thing that sucks is that it kept messing with me. During thunderstorms, I usually don't bother to turn my computers off, because from experience, I've had no problems. I know many people think that it's very stupid of me to keep them on, but my computers have always come back to life when power is restored. Last week, this wasn't exactly the case. The only computer that got hurt during this storm was basically the most important computer in the house. I've been working on it for a year or so, gradually configuring some things. I had it perfectly setup for LDAP authentication from other clients, NFS, Samba using LDAP, network printing, Ravencore (a cool webhosting control panel), and a bunch of other things. All of that took a good bit of time. The thing that sucked about the storm is that power would go out...then on...then out...then on. My brothers depend on that computer for LDAP authentication and NFS for the computers in their rooms, so the computer has to stay on. Each time the power would go out, I'd turn that server back on when power was available. At the end of all that mess, the computer started acting funny. SSH no longer worked, the command "shutdown -h now" or "reboot" did nothing, and other weird things kept happening. To "fix" this, I decided to try fsck. A few things were apparently fixed, but this didn't solve any of my issues. I did a hard reboot and the problems persisted. My Debian machine has a grub entry for "recovery mode." I entered this mode and then tried fsck--it didn't look good. I got errors all over the place. I knew it wasn't a good sign, and I pretty much knew from then on that if I finished with fsck, my hard drive would be finished as well. Something made me continue on. After a million or so entries (I had to put a mass on the 'y' key), it was done. The ext3 filesystem was turned into ext2 because the journal was deleted. I tried a reboot, and that's all I got. Grub wouldn't load and everything from there on out is a mess. I have since formatted that hard drive and installed a fresh new copy of Debian Sarge. I'll have to spend countless hours bringing it back to what it was before. Luckily, no user folders were damaged because they were on a separate partition. I wonder if these fsck errors were because the root partition was mounted, but if so, why did fsck not produce so many errors when not in "recovery mode"?
Moral of the story: get a UPS and keep regular backups.
p.s. By my server "dying," it really didn't die. Only the data on the hard disk was hurt. The computer's hardware seems perfectly fine.
Categories : General
Trackbacks : No Trackbacks »

Trackbacks
No Trackbacks