Tuesday, April 12, 2011

Using loop devices to test GlusterFS

I had come across a issue when one of our users had questions like does X happen when one of the nodes in GlusterFS is almost full? Or does Y happen if one of the nodes is full? Or does GlusterFS work at all if couple of nodes are full?

Though the answer was straight forward I thought it would be better to test the functionality under those conditions before answering the obvious.

Initially I thought about launching a few VMs to do a quick test. But the partition sizes were too huge for my tests, it was going to be a long wait before I fill up the nodes. As a alternative I had to create smaller partitions which involves fdisk et al., and work back to restore to original disk layout (if necessary).

The better solution for this type of test would be to create a few huge files with `dd' command and use them as Gluster exports.

For example:

sac@odin:/data/disks $ for i in {1..4}; do
> dd if=/dev/zero of=disk$i bs=256k count=1000
> done
sac@odin:/data/disks $

Create a filesystem on the data files.

root@odin:/root # for i in {1..4}; do
> mkfs.ext3 /data/disks/disk$i
> done
root@odin:/root #

Mount the filesystem via the loop device

root@odin:/root # mkdir /mnt/{1..4}
root@odin:/root # for i in {1..4}; do
> mount /data/disks/disk$i /mnt/$i -o loop
> done
root@odin:/root #

Now we have four partitions with the sizes we want pretty cheaply, without needing to have multiple servers or partitions.

root@odin:/root # df -h /mnt/*
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 2.9G 69M 2.7G 3% /mnt/1
/dev/loop1 2.9G 70M 2.7G 3% /mnt/2
/dev/loop2 2.9G 70M 2.7G 3% /mnt/3
/dev/loop3 2.9G 70M 2.7G 3% /mnt/4
root@odin:/root #

These mount points are then used as export directories and can be played around to understand Gluster behavior when one of the partitions is filled up. Or the performance observations by building filesytems with various different flags.

Conclusion:

This is a fast and cheaper way to test GlusterFS functionality under various filesystems without having to bother about getting disks and creating partitions. The advantage is we need not repartition the disks to create different sized partitions, we can delete the file and create a new file with a different size. Better for functionality testing, sucks in performance though. Gluster behavior can be quickly tested over various filesystems before setting up dedicated disks for extensive testing. Building filesytems with various options and tuning for observing GlusterFS behavior is very easy.

Sunday, April 10, 2011

Gluster 3.0.x to 3.1.x migration

Migrating from GlusterFS 3.0.x to 3.1.x is explained in http://bit.ly/ibgF6K, however this migration process leaves room for errors. More precaution is necessary while migrating between these major versions.

I have listed a few steps which have to be followed during the 3.0.x to 3.1.x migration. And the error one might encounter due to faulty migration and the steps to overcome them.

One of the new things that came in 3.1 is a concept called gfid, gfid is a extended attribute that gets set to every file and directory on a GlusterFS file system. So, essentially after migrating to 3.1 every file that is accessed from the mount point hence after is assigned this new extended attribute.

As a first step after migrating from 3.0.x to 3.1.x is to mount the cluster with a `single' client and run stat on the mount point recursively for e.g `ls -lR' >/dev/null. Double check if other clients are accessing the cluster, and shut them down.

After upgrade, if more than one client accesses the cluster, there is a possibility that directories on the backends might end up with different gfids, I have illustrated this in the below example. In such cases directory or file removal fails and you might see some unexpected behaviors. Below is a error due to gfid mismatch...

root@odin:/mnt/distribute1# rm -rf glusterfs-3.*
rm: cannot remove `glusterfs-3.0.5/extras/volgen': Directory not empty


The fix for this is to recognize such directories and remove the extended attribute trusted.gfid on the backend and run stat from the mount point. Make sure no other clients are accessing these directories at the same time.

A illustration of how it looks:

root@odin:/mnt/distribute1# rm -rf glusterfs-3.2.0qa8/
rm: cannot remove `glusterfs-3.2.0qa8/: Directory not empty
root@odin:/mnt/distribute1#

Examining the backend I see libglusterfs is a directory within glusterfs-3.2.0qa8/ examining further you see...

root@odin:/media# find /media/ -type d -name 'libglusterfs' | \
xargs -d'\n' getfattr -d -m trusted.gfid -e hex

getfattr: Removing leading '/' from absolute path names
# file: media/5/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0x9c3986db772d413a97ba79549b57370f

# file: media/4/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0x8ae60902d0894c7ea52ad1061ee1e158

# file: media/1/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0x8ae60902d0894c7ea52ad1061ee1e158

# file: media/3/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0x8ae60902d0894c7ea52ad1061ee1e158

# file: media/2/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0x9c3986db772d413a97ba79549b57370f


Notice that the gfids are not same on all the backends, which is a requirement.

Solution:

On the backend remove the extended attribute trusted.gfid for the problem directory:

root@odin:/media# find /media/ -type d -name 'libglusterfs' | \
xargs -d'\n' setfattr -x trusted.gfid
root@odin:/media# find /media/ -type d -name 'libglusterfs' | \
xargs -d'\n' getfattr -d -m trusted.gfid -e hex
root@odin:/media#

No attributes all right. Now run a stat on the mount point to fix the gfid.

root@odin:/mnt/distribute1# stat glusterfs-3.2.0qa8/libglusterfs

File: `glusterfs-3.2.0qa8/libglusterfs'
Size: 20480 Blocks: 80 IO Block: 131072 directory
Device: 16h/22d Inode: 39460 Links: 2
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2011-04-11 17:25:52.000000000 +0530
Modify: 2011-04-11 16:50:33.000000000 +0530
Change: 2011-04-11 17:27:00.000000000 +0530
root@odin:/mnt/distribute1#

On the backend the directory should now have same gfid for the directory on all the nodes.

root@odin:/media# find /media/ -type d -name 'libglusterfs' | \

xargs -d'\n' getfattr -d -m trusted.gfid -e hex
getfattr: Removing leading '/' from absolute path names
# file: media/5/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0xcfeeacae15b54738b8fc6d60bd1ff05c

# file: media/4/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0xcfeeacae15b54738b8fc6d60bd1ff05c

# file: media/1/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0xcfeeacae15b54738b8fc6d60bd1ff05c

# file: media/3/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0xcfeeacae15b54738b8fc6d60bd1ff05c

# file: media/2/glusterfs-3.2.0qa8/libglusterfs
trusted.gfid=0xcfeeacae15b54738b8fc6d60bd1ff05c

root@odin:/mnt/distribute1# rm -rf glusterfs-3.2.0qa8/
root@odin:/mnt/distribute1#

The the directory can be removed from the mount point provided we have fixed the layouts of the directory and its sub-directories.

This problem was first spotted on one of our customers and thanks to Avati for extensive debugging and figuring out the root cause and the solution.