50

How to move directories that have files in common from one to another partition ?

Let's assume we have partition mounted on /mnt/X with directories sharing files with hardlinks. How to move such directories to another partition , let it be /mnt/Y with preserving those hardlinks.

For better illustration what do I mean by "directories sharing files in common with hardlinks", here is an example:

# let's create three of directories and files
mkdir -p a/{b,c,d}/{x,y,z}
touch a/{b,c,d}/{x,y,z}/f{1,2,3,4,5}
# and copy it with hardlinks
cp -r -l a hardlinks_of_a

To be more specific, let's assume that total size of files is 10G and each file has 10 hardlinks. The question is how to move it to destination with using 10G (someone might say about copying it with 100G and then running deduplication - it is not what I am asking about)

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Grzegorz Wierzowiecki
  • 13,865
  • 23
  • 89
  • 137

6 Answers6

46

rsync has a -H or --hard-links option for this, and has the usual rsync benefits of being able to be stopped and restarted, and to be re-run to efficiently deal with any files that were changed during/after the previous run.

-H, --hard-links
    This tells rsync to look for hard-linked files in
    the source and link together the corresponding
    files on the destination.  Without  this option,
    hard-linked files in the source are treated as
    though they were separate files. [...]

Read the rsync man page and search for -H. There is a lot more detail there about particular caveats.

cas
  • 1
  • 7
  • 119
  • 185
  • 2
    I've checked - it works. – Grzegorz Wierzowiecki Jul 31 '12 at 18:16
  • yep, i know. I've been using it for years in my backup scripts. also to move files between filesystems as in your question. – cas Jul 31 '12 at 22:03
  • rsync uses gobs of memory when building its file list. For me after many hours of "Building file list..." it filled up my 16GB of memory and bailed having copied nothing. YMMV. – msc Feb 02 '18 at 01:57
  • 3
    From `man rsync`: *Beginning with rsync 3.0.0, the recursive algorithm used is now an incremental scan that uses much less memory than before and begins the transfer after the scanning of the first few directories have been completed. This incremental scan only affects our recursion algorithm, and does not change a non-recursive transfer. It is also only possible when both ends of the transfer are at least version 3.0.0.* Note that both `--delete-before` and `--delete-after` disable this improved algorithm. – cas Feb 02 '18 at 02:18
  • Also, while `rsync` is an incredibly useful too, it isn't always the best tool for every job. These days, I prefer to use ZFS datasets so I can snapshot and `zfs send` them - I mostly use rsync on non-ZFS filesystems. `btrfs` has a similar snapshot + send capability. – cas Feb 02 '18 at 02:22
  • Thank you @cas. The rsync in macOS High Sierra is 2.6.9. I'll see if I can get 3.0+ via MacPorts or some other way. – msc Feb 03 '18 at 00:52
  • @cas I don't see why rsync doesn't think `-H` requires knowing the entire file list. The fact that it doesn't means `-H` simply doesn't work as expected in most cases! – Michael Apr 22 '18 at 17:05
  • @Michael You don't need the file list ahead of time for -H to work -- proceeding incrementally is fine. You only need the list of _files transferred so far_ to know when to use a hardlink at the receiving end. – Matt Jul 30 '20 at 14:31
  • @Matt duh, i was such a pearhead back in 2018! – Michael Jul 30 '20 at 17:12
  • MacOS _rsync_ will probably never go above 2.6.9 (without an Apple rewrite). Starting in [version 3.0](https://download.samba.org/pub/rsync/NEWS#3.0.0) it went to GPL v3. – Brian B Feb 18 '21 at 22:45
41

First answer: The GNU Way

GNU cp -a copies recursively preserving as much structure and metadata as possible. Hard links between files in the source directory are included in that. To select hard link preservation specifically without all the other features of -a, use --preserve=links.

mkdir src
cd src
mkdir -p a/{b,c,d}/{x,y,z}
touch a/{b,c,d}/{x,y,z}/f{1,2,3,4,5}
cp -r -l a hardlinks_of_a
cd ..
cp -a src dst
Alan Curry
  • 2,254
  • 19
  • 13
  • 3
    +1 on tar, -1 for using gnu-specific arguments for cp. – WhyNotHugo Jul 30 '12 at 15:11
  • You gave three answers in one. Could you split them into three so they can be commented and evaluated separately ? (Tip: You can edit this, to leave only one - for example "cp -a". Later add two more, for "tar" and "pax") – Grzegorz Wierzowiecki Jul 31 '12 at 13:03
  • I've checked - `cp -a` works ! (please @AlanCurry separate answers into three) – Grzegorz Wierzowiecki Jul 31 '12 at 18:17
  • 1
    @GrzegorzWierzowiecki split accomplished – Alan Curry Jul 31 '12 at 18:42
  • 6
    @Hugo: there's nothing wrong with using GNU-specific args to standard tools. GNU versions are the de-facto standard these days, and even when they weren't pre-installed, it was common practice to install GNU tools (I know I always did - they were simply better than, e.g, solaris and *bsd versions, and they provided consistency between different *nixes). It's probably good practice to point out GNUisms when you use them but not required. Also Grzegorz didn't say "not on linux" so it's reasonable to assume that that's the environment he's talking about. – cas Jul 31 '12 at 21:57
  • It's not reasonable to assume he uses the same OS as you, and it's not common practice to install gnu base tools on non-gnu systems. As a minimum, you should always clarify this. Using truisms DECREASES portability; POSIX is way more standard. – WhyNotHugo Aug 01 '12 at 02:46
  • So, I am happy to see non-gnu answers in topic as well :). (Please remember that this answer was edited, and previously has gnu and non-gnu answers, not it's split into three, so you can up-vote whichever you want) – Grzegorz Wierzowiecki Aug 01 '12 at 20:30
  • GNU is far from standard no the desktop, what with Mac OS X shipping BSD tools. This won't work on Mac. – hraban Feb 19 '18 at 17:39
  • 1
    @WhyNotHugo: How is POSIX "may more standard?". POSIX is the stuff which brought us where we are. Did you know that all Windows versions since Windows NT are fully POSIX compliant? They have a path length limitation of 255 characters when using the POSIX file I/O functions, which renders them useless. Did you know that Solaris, Irix, HP-UX are all POSIX compliant, and yet all the arguments to their tools differ (e.g. tar). cp -a is a minimum requirement for any cp version which wants to replace GNU copy. – Johannes Overmann Feb 27 '19 at 22:24
  • @hraban: Who is using the BSD tools on MacOS? :-) (SCNR) – Johannes Overmann Feb 27 '19 at 22:26
19

Third answer: The POSIX Way

POSIX hasn't standardized the tar utility, although they have standardized the tar archive format. The POSIX utility for manipulating tar archives is called pax and it has the bonus feature of being able to do the pack and unpack operation in a single process.

mkdir dst
pax -rw src dst
Alan Curry
  • 2,254
  • 19
  • 13
16

Second answer: The Ancient UNIX Way

Create a tar archive in the source directory, send it over a pipe, and unpack it in the destination directory.

# create src as before
(cd src;tar cf - .) | (mkdir dst;cd dst;tar xf -)
Alan Curry
  • 2,254
  • 19
  • 13
  • 1
    checked -> works. Hardlinks preserved. – Grzegorz Wierzowiecki Jul 31 '12 at 20:33
  • 1
    Any insight into why this actually does preserve hardlinks? – peterph Aug 05 '15 at 21:02
  • 1
    Because `tar` preserves hard-links. In GNU tar, at least, you can disable this behaviour with `--hard-dereference` – cas Sep 02 '15 at 08:11
  • 1
    In my case, attempting to copy a large directory hierarchy (a TimeMachine backup), tar preserved some hard links but replicated the file in some cases. I think this is because the `tar x` does not have the full file list as files are still being piped in from the `tar c`. Probably if you saved the entire archive before extracting it, it would be okay. I'd be very happy if someone could confirm that theory. – msc Feb 02 '18 at 02:04
10

Source: http://www.cyberciti.biz/faq/linux-unix-apple-osx-bsd-rsync-copy-hard-links/

What you need to make an exact copy is

rsync -az -H --delete --numeric-ids /path/to/source/ /path/to/dest/
Pykler
  • 301
  • 3
  • 6
  • See my comment about rsync above. – msc Feb 02 '18 at 01:57
  • 2
    I suspect this won't copy ACLs, extended attributes, and so forth. The Linux version also has the -A and -X options to preserve these, but I think you're out of luck on MacOS. – Edward Falk Dec 07 '18 at 00:02
  • Can you explain your rationale of using the -z option (compress) when using rsync to copy between two mounted folders (since this is what was asked) ? – jmr May 16 '20 at 00:01
0

the commands you can use to copy a file and preserve hardlinks in macOS CLI:

cp -r -l