13

I have two folders on the same partition (EXT2) If I mv folder1/file folder2 and some interruption occur (e.g. power failure) could the file system ever end up being inconsistent?

Isn't the mv operation atomic?

Update: So far on IRC I got the following perspectives:

  1. it is atomic so inconsistencies cannot happen
  2. first you copy the dir entry in the new dir and then erase entry on previous dir, so you may have the inconsistency of having a file referenced twice, but the ref count is 1
  3. it first erases the pointer and then copy the pointer so the inconsistency is that the file has reference 0

Can someone clarify?

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
graphtheory92
  • 245
  • 2
  • 7

3 Answers3

13

The rename operation is very fast on any filesystem, so it is unlikely to be interrupted, but on a classical filesystem it certainly can be interrupted - if it creates the destination link first, it could leave two links on a file - which is legal, but the file thinks it only has one, which could cause problems if one is deleted later. On the other hand, if it removes the source link first, the file could be lost. Running fsck will usually detect and correct either condition, though if the file is lost it will be placed in a "lost+found" directory with an arbitrary name rather than in the desired location - and if it has two links the link count will simply be updated, so the file will exist in two locations if the filesystem supports this.

If you need a filesystem to be robust in the face of power failures, you should use a journaling filesystem, such as NTFS, EXT3, or XFS. Most modern systems will use a journaling filesystem by default, though you should be aware that FAT is not a journaling filesystem if you use it for external drives.

A journaling filesystem uses a "double entry" system - it writes to the journal file the fact that it intends to move it, then performs the move. When the filesystem is checked at startup, if it was interrupted, it will notice that the move was not completed and redo it then.

There are two types of journaling filesystems - metadata journaling and full journaling. Metadata journaling means it does not keep track of changes to file contents in the journal system (so, you could end up losing the contents if you are writing to a file), but it will still keep track of important filesystem information such as directory contents, file properties, etc.


When people talk about the rename operation being atomic, they mean that it can't be observed mid-transition by another process on the system, and it can't be left half-completed by e.g. interrupting the mv command itself with ^C. The physical process of writing to each directory, whose storage space may be in widely different locations on the disk, cannot possibly be a truly atomic operation at the hardware level.


For completeness, I'll note that there are also some incidental I/O operations associated with a rename in addition to creating the new link in the destination directory and removing it in the old one - updating the mtime of both directories, possibly extending the allocation size of the destination directory, changing the .. link and the link counts of the parent directories if the file is a directory. Also, I am not sure if the atime of the file itself is affected.

Random832
  • 10,476
  • 1
  • 34
  • 40
  • A journal doesn't guarantee atomicity wrt power failures. I think that ext3 and ext4 do guarantee that `rename` is atomic, but btrfs doesn't according to the wiki (see my answer). It's also possible to guarantee atomicity without a journal (I don't know of examples on Linux but there may be some). Do you have reliable information about ext2? – Gilles 'SO- stop being evil' May 10 '15 at 00:15
  • @Gilles do you have any information about how it can even theoretically be guaranteed without a journal? I mean, at the basic level, we're talking about synchronizing writes to two different files to guarantee that you never get the result that only one of them was performed. – Random832 May 10 '15 at 03:59
  • [Log-structured filesystems](http://en.wikipedia.org/wiki/Log-structured_file_system) maintain consistency by not overwriting blocks that are in use. This is well-suited to flash media where overwriting existing data is costly. The log isn't really like a journal because nothing is replayed when mounting — though you could say that the whole filesystem is the journal (except that mounting never involves replaying the whole thing in memory as it would be too slow). The description of [LogFS](http://en.wikipedia.org/wiki/LogFS) in Wikipedia is a good overview. – Gilles 'SO- stop being evil' May 10 '15 at 12:02
11

First, let's dispel some myths.

it is atomic so inconsistencies cannot happen

Moving a file inside the same filesystem (i.e. the rename) system call is atomic with respect to the software environment. Atomicity means that any process that looks for the file will either see it at its old location or at its new location; no process will be able to observe that the file has a different link count, or that the file is present in the source directory after being present in the destination directory, or that the file is absent from the target directory after being absent in the source directory.

However, if the system crashes due to a bug, a disk error or a power loss, there is no guarantee that the filesystem is left in a consistent state, let alone that the move isn't left half-done. Linux does not in general offer a guarantee of atomicity with respect to hardware events.

first you copy the dir entry in the new dir and then erase entry on previous dir, so you may have the inconsistency of having a file referenced twice, but the ref count is 1

This refers to a specific implementation technique. There are others.

It so happens that ext2 on Linux (as of kernel 3.16) uses this particular technique. However, this does not imply that the disk content goes through the sequence [old location] → [both locations] → [new location], because the two operations (add new entry, remove old entry) are not atomic at the hardware level either: it is possible for one of them to be interrupted, leaving the filesystem in an inconsistent state. (Hopefully fsck will repair it.) Furthermore the block layer can reorder writes, so the first half could be committed to disk just before the crash and the second half would then not have been performed.

The reference count will never be observed to be different from 1 as long as the system doesn't crash (see above) but that guarantee does not extend to a system crash.

it first erases the pointer and then copy the pointer so the inconsistency is that the file has reference 0

Once again, this refers to a particular implementation technique. A dangling file cannot be observed if the system doesn't crash, but it is a possible consequence of a system crash, at least in some configurations.


According to a blog post by Alexander Larsson, ext2 gives no guarantee of consistency on a system crash, but ext3 does in the data=ordered mode. (Note that this blog post is not about rename itself, but about the combination of writing to a file and calling rename on that file.)

Theodore Ts'o, the principal author of the ext2, ext3 and ext4 filesystems, wrote a blog post on the same issue. This blog post discusses atomicity (with respect to the software environment only) and durability (which is atomicity with respect to crashes plus a guarantee of commitment, i.e. knowing that the operation has been performed). Unfortunately I can't find information about atomicity with respect to crashes alone. However, the durability guarantees given for ext4 require that rename is atomic. The kernel documentation for ext4 states that ext4 with the auto_da_alloc option (which is the default in modern kernels), as well as ext4, provides a durability guarantee for a write followed by a rename, which implies that rename is atomic with respect to hardware crashes.

For Btrfs, a rename that overwrites an existing file is guaranteed to be atomic with respect to crashes, but a rename that does not overwrite a file can result in neither file or both files existing.


In summary, the answer to your question is that not only is moving a file not atomic with respect to crashes on ext2, but it isn't even guaranteed to leave the file in a consistent state (though failures that fsck cannot repair are rare) — pretty much nothing is, which is why better filesystems have been invented. Ext3, ext4 and btrfs do provide limited guarantees.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
0

This question has been asked in a slightly different manner on Super User. The Wikipedia page on the mv command also explains it quite well:

Moving files within the same file system is generally implemented differently than copying the file and then removing the original. On platforms that do not support the rename syscall, a new link is added to the new directory and the original one is deleted. The data of file is not accessed.

Linux has the rename syscall and will therefore rename the file as an atomic, i.e. uninteruptable, operation. So no, the filesystem cannot become inconsistent in the situation that you described.

Benjamin B.
  • 758
  • 4
  • 14
  • 2
    is the rename sys call an os abstraction? Since hardware wise, I could always be able to interrupt a series of operations since rename must be a series of operations – graphtheory92 May 09 '15 at 13:12
  • No, it's not an OS abstraction, but I thought stating "therefore it is highly unlikely that the filesystem will become inconsistent..." would make things overly complicated. I agree with you though. – Benjamin B. May 09 '15 at 13:16
  • so, is this implemented in hardware? – graphtheory92 May 09 '15 at 17:01
  • 2
    This answer leaves me wondering *why* the `rename` system call cannot result in the filesystem being in an inconsistent state even if there's a power failure. I felt like this was the core of @graphtheory92's question. – Tanner Swett May 09 '15 at 17:15
  • 1
    @graphtheory92: If a system call is atomic it does not mean at all that the resulting disk operation (or a series of disk operations!) will be atomic too. ------ I can imagine that by moving a file (hard link count 1) and cutting the power, hard disk connection or crashing the kernel at right time you can end up with two hard links (the original and the new one) to the file with the hard link count still being 1. ------ I think there are two basic solutions to the problem: a) software - journaling FS which can automatically recover from inconsistent states. b) HW supported transactions. – pabouk - Ukraine stay strong May 09 '15 at 18:27
  • 2
    The guarantee of atomicity that you refer to is with respect to observation by other processes. It does not hold if the system crashes. – Gilles 'SO- stop being evil' May 10 '15 at 00:16