This week at work we’ve been experimenting with MogileFS (link) which is a “filing system” capable of storing/retrieving many files across many hosts. It’s not a true file system, meaning that you can’t mount it and look at it with “ls” or “cat” files, etc. but using a custom perl module you can put files in and take them back out.
Some interesting facts about it: It’s based on HTTP GET/PUT so I believe it would be a good complement to other systems we have at work (without going into too much detail, it’s no surprise to anyone that Shutterfly receives lots of picture files from users and stores them for later printing :) Also, the MogileFS tracker module takes care of ensuring that files are replicated like you want; for example, if you want 2 copies of each file living on different nodes, after the initial upload the file will be duplicated appropriately. In the event of failure of one of the nodes, other nodes that hold the same data as the lost one will duplicate the items again to ensure replication is maintained.
What’s good about it: it’s based on HTTP (also an NFS option but that seems unhelpful to us), and it has been tested at very large installations (including LiveJournal) with good results. It’s made of Perl and MySQL and it doesn’t appear too complex to understand, so if it behaves badly we can probably fix it ourselves.
What’s not good about it: its documentation is pretty sketchy. On trying to get it to build properly for the first time, I changed the perl code in two or three places just to get the automated Makefile tests to pass. (1. warn statement refers to @self instead of $self-> 2. socket my $sock, arg, arg instead of my $sock; socket $sock,… and possibly one other I’m forgetting) Both of these are probably not so much statements of the underlying quality, but more like an indicator that the maintainers don’t care much about providing “complete packages” that are plug-and-play. That means if you’re not comfortable fixing things in perl, don’t build your site on it and wait for someone *else* to fix things and create a new shrink-wrapped release.
Open questions I need to resolve include how to deal with large files, and how to deploy the system across a large farm. As for file size, I tried a 2G file (tar -cf – /usr) which spun for a while and then killed the client saying “Out of memory”, which suggests that stuff I’m stuffing into its file handle isn’t going straight into a file, and might mean that a buffer is getting copied in-memory. So it’s probably fine for jpg images, but larger things like oracle dbf files (or a larger tar of the oracle data directory) would need a split/recombine strategy that the filing system doesn’t provide.
Edit: After writing this I found a script in the “utils” package that takes large files and stores them as “chunks”, and it also does cool stuff like 1. gzip as it’s going in and gunzip as it’s coming out, and 2. store directories (using tar) and even entire disks (using direct read similar to dd). So, I have to track down and test this “mogtool” beast.
How to make a backup of certain files once they’re in is an open question too… maybe in addition to the “N copies” replication system we could create a modified “N copies plus one on the dedicated backup node” or something, but that’s probably harder that just pulling files out, putting them in a normal FS, and running the backup from there. I want the backup system to work with Mogile, but I also want it to work with other filing systems that come along, so I’d rather build a separate module that can back up either files or URLS, than modify Mogile too much.
Where to go from here: I want to find or quickly generate a split/recombine strategy, hopefully one that talks to multiple nodes at once so I can do stuff like “use 10 threads each pulling a different file.” If I can do this, I can immediately start putting Oracle backups into it. Probably the useful limit for a single file/chunk will be between 20M and 200M – I think the mailing list suggested 64M chunks. If there isn’t a clean/clear way to do chunking, check out the NFS option as an alternative.
Anyone who read all that and understood most of it, I’d be interested in your comments/feedback. I may or may not continue to post about it depending on interest level.