edoceo: Latin "to inform fully, instruct thoroughly"

Edoceo's Blog: Deleted Database Recovery - Open File Descriptors Save Lives

Tuesday, June 16, 2009

Deleted Database Recovery - Open File Descriptors Save Lives

Yesterday I was cleaning up carbon (a database server) and was attempting to remove files from a mounted raid device that had a clone of the live file system. Cleaning involves removing unused/unneeded directories & files.

Well the location I'm cleaning has a copy of / so naturally there are directories and files in there like: /usr, /etc/, /var and others.

The cleaning involves my favourite command rm -fr - which can be dangerous. Such as the following sequence:

# cd /mnt/raid/root-copy/
# rm -fr ./etc
# rm -fr ./usr
# rm -fr /var

Shit! Forgot the . prefix on that last command, I hit CTRL+C as fast as I could! Time to see what I killed!

The most important thing was my database in /var/lib/postgresql. So I was very angry at myself when this happend:

# stat /var/lib/postgresql
stat: cannot stat `/var/lib/postgresql': No such file or directory

shit shit shit

There went my database! All my client records, histories, book-keeping & accounting back to 2004. My last conscious backup was from the first of the month - 15 days ago.

Fortunately I had an automated backup script running that had yesterdays data. But what about all the work I did today? I hate repeating myself.

Well, here's a cool trick. On UNIX style OSes the files are not actually deleted when you say rm. They are simply marked as deleted, only in the inode structure of the file-system. Once all processes close that file then the inode structure (with reference count zero) says's that space on the disc is free and can then be used.

So I had delete my /var/lib/postgresql directory but guess what! PostgreSQL had 100s of open files in there! Hooray! And the postgres process still had access to all those files in that directory!

So, I looked at the process and it's open files (/proc/[pid]/fd/) and could see a lot of open files. Hoping for the best I re-ran my postgres backup script. It had full access to all data, in all databases (14) and dumped out my 2.5GiB worth of book-keeping and company financial records. There is an interesting document on the proc file-descriptors and undeleting from finalcog .

When I stopped the PostgreSQL server it was dead, wouldn't restart and had to have it's whole data area re-initialized. I was able to recover the system right back to the point (4am) when I had issued the errant rm -fr command.

For the record I know that rm -fr is dangerous. I frequently find myself telling others to be careful with it. Remember rm -fr is short hand for rm --fuck-it --really

So yea I had backups, but they were more than 12 hours old, who wants that. Thanks to open file-descriptors I was very very lucky and able to recover the data to point-in-time of failure.

BACKUP YOUR DATA!!!

2 comments:

Mau said...

Great to know that someone else is also taking advantage of open file descriptors :)

database in recovery said...

Once all processes close that file then the inode structure (with reference count zero) says's that space on the disc is free and can then be used.