0

I have two directory trees (actually macOS volumes) with similar files and want to create a third directory tree containing only the files which are changed between the two source trees.

How can I do this with rsync? Or should I use another tool for this?

halloleo
  • 519
  • 7
  • 22
  • Are special actions required for files that are only in the first or second directory? – SergA Apr 20 '22 at 09:32
  • @SergA No, I just want to create a “delta directory tree” which should be a lot smaller in total size than the second full tree (Tree 1 and tree 2 are very similar.), so that at the end I can delete the second tree and use up a lot less space. – halloleo Apr 20 '22 at 12:05
  • See [here](https://unix.stackexchange.com/questions/25195/how-do-i-save-changed-files?) – JRFerguson Apr 20 '22 at 12:14
  • 3
    Does this answer your question? [How do I save changed files?](https://unix.stackexchange.com/questions/25195/how-do-i-save-changed-files) – JRFerguson Apr 20 '22 at 12:15
  • @JRFerguson I tried `rsync` with `--compare-dest`, but the result was totally wrong. Didn't seem to work in my case. – halloleo Apr 21 '22 at 09:22

1 Answers1

1
  • Using rsync (based on this answer):
    SOURCE_TREE_1="/some/where/tree_1"
    SOURCE_TREE_2="/some/where/tree_2"
    DIFF_DIR="/some/where/diff"
    
    rsync --dry-run \
          --recursive \
          --checksum \
          --itemize-changes \
          "$SOURCE_TREE_1/" "$SOURCE_TREE_2" | # trailing '/' in source directory is necessary!
      grep -E '^>fc' | # grep files with different checksums
      cut -d ' ' -f 2- |
    while read file; do
      subdir="${file%/*}";
      [ "$file" != "$subdir" ] && mkdir -p "$DIFF_DIR/$subdir";
    
      # Change `"$SOURCE_TREE_1/$file"` to `"$SOURCE_TREE_1/$file"` in the next
      # line if you want copy from source tree #2.
      cp -a "$SOURCE_TREE_1/$file" "$DIFF_DIR/$file";
    done
    
  • Using git:
    cd "$SOURCE_TREE_1"
    git init .
    git add .
    git commit -m 'Init' # Note: Here git may ask you to set name and email
    
    # replace all files with files from source tree #2
    rm -rf $(git ls-tree --name-only HEAD)
    rsync --archive "$SOURCE_TREE_2/" .
    
    # show changes briefly:
    git status -uno
    # show changes for some file:
    git diff "path/to/file"
    
    # restore source tree #1 state
    git restore .
    
SergA
  • 186
  • 4
  • Thanks for this. I've tried the rsync solution - and it certainly is a great starting point. (I have a strange issue: Using `du` the directory sizes don't add up: du(FS1) + du(FSDIFF) < du(FS2). The difference is not minute: 25 GB for a FS1 of 450GB – halloleo Apr 21 '22 at 09:20
  • In fact, there can be any inequality or equation of sizes. Because of the requirement to do nothing with files that exist in the first directory tree and do not exist in the second, and vice versa. Secondly, there may be more exotic options: [hard links](https://en.wikipedia.org/wiki/Hard_link), [sparse files](https://en.wikipedia.org/wiki/Sparse_file) that occupy fewer disk blocks than the file sizes (the `du` utility calculates disk usage). – SergA Apr 26 '22 at 11:35