I have a cluster with a bunch of servers with a shared disk containing a GFS global file system that all nodes access simultaneously.
Each node in the cluster run the same program (a shell script is the main core). The system processes files that appear in a couple of input directories, and it works like this:
- the program loops through the input directories.
- for each file found, check existence of a "lock file", if lock file exists skip to next file.
- if no lock file found, create lock file. If lockfile creation failed (race lost), skip to next file
- if "we" own the lock, process the file and move it out of the way when it is finished.
This all works very well, but I wonder if there are cheaper (less complex) solutions that would also work. I'm thinking NFS or SMB perhaps.
There are two reasons for my use of GFS:
- each file is stored in one place only (on redundant underlying hardware of course)
- file locking works reliably
I create the lockfile like this:
date '+%s:'${unid} > ${currlock}.${unid}
ln ${currlock}.${unid} ${currlock}
lockrc=$?
rm -f ${currlock}.${unid}
where $unid is a unique session identifier and $currlock is /gfs/tmp/lock.${file_to_process}
The beauty of ln is that it is atomic, so it fails for all but one that attempts the same thing at the same time.
So, I guess what I'm asking is: will NFS fill my needs? Does ln work reliably in the same way on NFS as on GFS?