Distributed storage cluster

Subscribe to Distributed storage cluster 3 posts, 3 voices

 
yonkeltron 17 posts

Ok, here’s a hypothetical question: If I were going to build a reliable distributed storage cluster, what would be the best way? Considering how many distributed filesystems there are (GlusterFS, GFS, Lustre, OCFS2) I wonder which is the best to use. One of the big features for me would be reliability. I read about the Google Filesystem storing data in a seriously redundant fashion (3 copies?) and though some seem able to do that, I was wondering if anyone has experience using any of these options.

Has anyone even built a storage cluster like this before?

 
sub 2 posts

My first thought is that the ‘best way’ is likely to be dependent on what type of data, how much data, and how this data is going to be used.

GFS and GFS2 only really allow for multiple systems accessing the same (shared) storage. Your data will need to be redundant at the block level, ie: on a good SAN with RAID or potentially using something like DRBD that will distribute block-level data between multiple nodes over the network. I’ve tried the latter, but having more than 2 DRBD nodes requires some trickery (I believe they refer to this as “stacking”) and GFS/GFS2 is less than optimal with less than 3 nodes due to fencing and possible split-brain scenarios.

In my implementation, I did not find DRBD+GFS2 to be reliable with only 2 nodes (DRBD would sync fine, GFS2 had fencing problems) and had to manually fsck with the stack to get my nodes to rejoin the cluster.

Another possibility is using something like MogileFS (http://www.danga.com/mogilefs/ – made by the makers of memcached) but it is important for me to point out that your application would need to be written to use MogileFS as it is not a filesystem layer, although a Fuse plugin for it would be interesting to see.

 
Al Gordon 97 posts

Not sure if this is the kind of thing you’re looking for, but Eucalyptus provides an open source Amazon S3 and EBS clone.

About EBS:
  • Amazon EBS allows you to create storage volumes from 1 GB to 1 TB that can be mounted as devices by Amazon EC2 instances. Multiple volumes can be mounted to the same instance.
  • Storage volumes behave like raw, unformatted block devices, with user supplied device names and a block device interface. You can create a file system on top of Amazon EBS volumes, or use them in any other way you would use a block device (like a hard drive).
  • Amazon EBS volumes are placed in a specific Availability Zone, and can then be attached to instances also in that same Availability Zone.
  • Each storage volume is automatically replicated within the same Availability Zone. This prevents data loss due to failure of any single hardware component.
  • Amazon EBS also provides the ability to create point-in-time snapshots of volumes, which are persisted to Amazon S3. These snapshots can be used as the starting point for new Amazon EBS volumes, and protect data for long-term durability. The same snapshot can be used to instantiate as many volumes as you wish.

Eucalyptus: http://open.eucalyptus.com/

Getting Started with Ubuntu Enterprise Cloud powered by Eucalyptus in 9.04: https://help.ubuntu.com/community/Eucalyptus

Amazon Elastic Block Store (EBS): http://www.johnmwillis.com/amazon/amazon-elastic-block-store-ebs/

copyright © 2009 scosug - all rights reserved
SCOSUG By-Laws