7

I was reading the rmlint manual, and one of the duplicate handlers are clone and reflink:

· clone: btrfs only. Try to clone both files with the BTRFS_IOC_FILE_EXTENT_SAME ioctl(3p). This will physically delete duplicate extents. Needs at least kernel 4.2.

· reflink: Try to reflink the duplicate file to the original. See also --reflink in man 1 cp. Fails if the filesystem does not support it.

What exactly does this clone do, and how is it different from a reflink? What does the BTRFS_IOC_FILE_EXTENT_SAME ioctl do?

Dan
  • 9,372
  • 5
  • 25
  • 39

1 Answers1

9

The differences are somewhat subtle.

Reflink deletes the duplicate file and creates a new file in its place which is a clone of the original file. The metadata of the duplicate is lost, although rmlint does its best to preserve the metadata via some trickery with touch -mr.

Clone uses the BTRFS_IOC_FILE_EXTENT_SAME ioctl (or, in the latest version, the FIDEDUPERANGE ioctl) which asks the kernel to check if the files are identical, if so then make them share the same data extents. They keep their original metadata. It's arguably safer than reflink because it's done atomically by the kernel, and because it checks that the files are still identical.

thomas_d_j
  • 1,481
  • 1
  • 9
  • 8
  • `rmlint`'s `reflink` uses `cp --archive --reflink=always` which does *not* delete the duplicate file in the normal case, so the original inode is not lost. `--archive` will replace all metadata (e.g. owner, permissions) of the duplicate file with that of the source file, but the timestamp is restored via `touch -mr`. However, if the duplicate file has multiple hardlinks, then the first ones *will* be replaced by `cp --archive`, although the last one will not, due to its having only one hardlink. I would think you would get more useful results by removing `--archive` in the `rmlint` output. – jrw32982 Apr 27 '22 at 17:18