How can I do a "copy if changed" operation?

Question

I would like to copy a set of files from directory A to directory B, with the caveat that if a file in directory A is identical to a file in directory B, that file should not be copied (and thus its modification time should not be updated). Is there a way to do that with existing tools, without writing my own script to do it?

To elaborate a bit on my use-case: I am autogenerating a bunch of .c files in a temporary directory (by a method that has to generate all of them unconditionally), and when I re-generate them, I'd like to copy only the ones that have changed into the actual source directory, leaving the unchanged ones untouched (with their old creation times) so that make will know that it doesn't need to recompile them. (Not all the generated files are .c files, though, so I need to do binary comparisons rather than text comparisons.)

(As a note: This grew out of the question I asked on https://stackoverflow.com/questions/8981552/speeding-up-file-comparions-with-cmp-on-cygwin/8981762#8981762, where I was trying to speed up the script file I was using to do this operation, but it occurs to me that I really should ask if there's a a better way to do this than writing my own script -- especially since any simple way of doing this in a shell script will invoke something like cmp on every pair of files, and starting all those processes takes too long.)

score 39 · Answer 1 · edited Jun 27 '14 at 17:45

39

You can use the -u switch to cp like so:

$ cp -u [source] [destination]

From the man page:

   -u, --update
       copy only when the SOURCE file is newer than the destination file or 
       when the destination file is missing

edited Jun 27 '14 at 17:45

slm

363,520
117
767
871

answered Jun 27 '14 at 17:26

gu1

419
4
2

6

From a comment on a similar A that was already deleted: "This will not work since it would copy also identical files, if timestamp of source is newer (and so update timestamp of destination, against the OP request)." – slm Jun 27 '14 at 17:48
2

Doesn't answer the question at all, but I still found it useful. – user31389 Oct 21 '16 at 11:19
1

Not available in macOS (11.2) Bash, unfortunately. – akauppi Apr 30 '21 at 09:06

score 37 · Accepted Answer · edited Jan 24 '12 at 14:45

37

rsync is probably the best tool for this. There are a lot of options on this command so read man page. I think you want the --checksum option or the --ignore-times

edited Jan 24 '12 at 14:45

Coren

4,970
1
24
43

answered Jan 24 '12 at 05:47

Adam Terrey

502
4
3

I should have noted that I already tried that, with no success. Both of those options only affect _whether_ rsync does a copy -- but, even when it does not do a copy, it either updates the target file's modification time to the same as the source (if the `-t` option is specified) or to the synchronization time (if `-t` is not specified). – Brooks Moses Jan 24 '12 at 05:52
4

@Brooks Moses: It doesn't. At least my version of `rsync` doesn't. If I do this: `mkdir src dest; echo a>src/a; rsync -c src/* dest; sleep 5; touch src/a; rsync -c src/* dest`, then `stat dest/a` shows its mtime and ctime are 5 secs older than the ones of `src/a`. – angus Jan 24 '12 at 08:48
@angus: Huh. Okay, you're right. The key seems to be the `--checksum` option, and although http://linux.die.net/man/1/rsync contains absolutely *nothing* that would imply that it has any affect on whether the modification date is updated, it nonetheless causes the destination modification date to be left untouched. (On the other hand, the `--ignore-times` option does not have this effect; with it the modification date is still updated.) Given that this seems to be entirely undocumented, though, can I rely on it? – Brooks Moses Jan 24 '12 at 09:32
2

@BrooksMoses: I think you can rely on it: `rsync`'s workflow is: 1) check if the file need to be updated; 2) if so, update the file. The `--checksum` option say it should not be updated, so `rsync` should not proceed to step 2). – enzotib Jan 24 '12 at 10:25
@enzotib: Your theory fails to explain why the timestamps get updated in the other cases (when, for instance, `--ignore-times` is used but the file is unchanged). Or are you saying that, in the other cases, `rsync` treats the time-difference as an update with no difference? – Brooks Moses Jan 24 '12 at 21:07
2

@BrooksMoses: `--ignore-times` without `--checksum` would copy every file, and so also update the timestamp, even if the files are identical. – enzotib Jan 24 '12 at 21:22
@user2436 +1 for `rsync --checksum` as a good general way to accomplish this, but in this particular case it would be better to use [ccache](http://unix.stackexchange.com/a/30432/11052). – aculich Jan 31 '12 at 00:05
5

This answer's high rating is unwarranted. Saying "read the man page" is barely more helpful than saying "go google the answer". – Dennis Aug 22 '19 at 17:22

score 9 · Answer 3 · answered Jan 31 '12 at 00:01

9

While using rsync --checksum is a good general way to "copy if changed", in your particular case there is an even better solution!

If you want to avoid unnecessarily recompiling files you should use ccache which was built for exactly this purpose! In fact, not only will it avoid unnecessary recompiles of your auto-generated files, it will also speed things up whenever you do make clean and re-compile from scratch.

Next I'm sure you'll ask, "Is it safe?" Well, yes, as the website points out:

Is it safe?

Yes. The most important aspect of a compiler cache is to always produce exactly the same output that the real compiler would produce. This includes providing exactly the same object files and exactly the same compiler warnings that would be produced if you use the real compiler. The only way you should be able to tell that you are using ccache is the speed.

And it's easy to use it by just adding it as a prefix in the CC= line of your makefile (or you can use symlinks, but the makefile way is probably better).

answered Jan 31 '12 at 00:01

aculich

1,180
10
14

1

I initially misunderstood and thought you were suggesting I use ccache to do part of the generating, but now I understand -- your suggestion was that I simply copy all the files, and then use ccache in the build process, thereby avoiding rebuilding the ones that hadn't changed. It's a good idea, but it won't do well in my case -- I have hundreds of files, usually only change one or two at a time, and am running under Cygwin where simply starting the hundreds of ccache processes to look at each file would take several minutes. Nonetheless, upvoted because it's a good answer for most people! – Brooks Moses Jan 31 '12 at 06:14
No, I was not suggesting that you copy all the files, rather you can just autogenerate your .c files in-place (remove the copy step and write to them directly). And then just use ccache. I don't know what you mean by starting hundreds of ccache processes... it is just a light-weight wrapper around gcc that is quite fast and will speed up re-building other parts of your project, too. Have you tried using it? I would like to see a comparison of the timing between using your copy-method vs ccache. You could, in fact, combine the two methods to get the benefits of both. – aculich Jan 31 '12 at 16:07
1

Right, ok, I understand now about the copying. To clarify, what I mean is this: If I generate the files in place, I have to then call `ccache file.c -o file.o` or the equivalent, several hundreds of times because there are several hundred `file.c` files. When I was doing that with `cmp`, rather than `ccache`, it took several minutes -- and `cmp` is as lightweight as `ccache`. The problem is that, on Cygwin, _starting a process_ takes non-negligible time, even for a completely trivial process. – Brooks Moses Feb 01 '12 at 03:57
Cygwin is all kinds of problematic, so that doesn't surprise me. I am curious, though, what keeps you tied to Cygwin? Seems to me it would be easier to spin up a Linux VM with [VirtualBox](https://www.virtualbox.org/) and do the primary development there, but then test it on Cygwin when you need to... you can still have Cygwin as a target platform (though with VM tech what it is, I don't see why Cygwin is still around anymore), but develop elsewhere. – aculich Feb 01 '12 at 04:35
1

As a datapoint, `for f in src/*; do /bin/true.exe; done` takes 30 seconds, so yeah. Anyway, I prefer my Windows-based editor, and aside from this sort of timing issue Cygwin works quite well with my workflow as the lightweight place to test things locally if I'm not uploading to the build servers. It's useful to have my shell and my editor in the same OS. :) – Brooks Moses Feb 01 '12 at 04:43
1

If you want to use your Windows-based editor you can do that quite easily with [Shared Folders if you install Guest Additions](http://www.virtualbox.org/manual/ch04.html#sharedfolders)... but hey, if Cygwin suits you, then who am I to say any different? It just seems a shame to have to jump through weird hoops like this... and compilation in general would be faster in a VM, too. – aculich Feb 01 '12 at 04:55

score 6 · Answer 4 · answered Jan 25 '12 at 09:29

6

This should do what you need

diff -qr ./x ./y | awk '{print $2}' | xargs -n1 -J% cp % ./y/

Where:

x is your updated/new folder
y is the destination you want to copy to
awk will take the second argument of the each line from the diff command (maybe you will need some extra stuff for filenames with space - can't try it now)
xargs -J% will insert the file name to cp at the proper place

answered Jan 25 '12 at 09:29

Patkos Csaba

2,500
1
20
15

2

-1 because this is overly-complicated, non-portable (`-J` is bsd-specific; with GNU xargs it is `-I`), and does not work correctly if the same set of files do not exist in both locations already (if I `touch x/boo` then grep gives me `Only in ./x: boo` which causes errors in the pipeline). Use a tool built for the job, like `rsync --checksum`. – aculich Jan 30 '12 at 23:24
Or better yet, for this specific case use [ccache](http://unix.stackexchange.com/a/30432/11052). – aculich Jan 31 '12 at 00:06
+1 because its a set of well known commands that I can break to use on similar tasks (came here for doing a diff), still rsync may be better for this particular task – ntg Nov 20 '17 at 08:22

Marcos · Answer 5 · 2012-02-02T22:43:59.243

I like to use unison in favor of rsync because it supports multiple masters, having already setup my ssh keys and vpn separately.

So in my crontab of only one host I let them synchronize every 15 minutes:

*/15 * * * * [ -z "$(pidof unison)" ] && (timeout 25m unison -sortbysize -ui text -batch -times /home/master ssh://192.168.1.12//home/master -path dev -logfile /tmp/sync.master.dev.log) &> /tmp/sync.master.dev.log

Then I can be developing on either side and the changes will propagate. In fact for important projects I have up to 4 servers mirroring the same tree (3 run unison from cron, pointing to the one that doesn't). In fact, Linux and Cygwin hosts mixed--except don't expect sense out of soft links in win32 outside the cygwin environment.

If you go this route, make the initial mirror on the empty side without the -batch, i.e.

unison -ui text  -times /home/master ssh://192.168.1.12//home/master -path dev

Of course there is a config to ignore backup files, archives, etc.:

 ~/.unison/default.prf :
# Unison preferences file
ignore = Name {,.}*{.sh~}
ignore = Name {,.}*{.rb~}
ignore = Name {,.}*{.bak}
ignore = Name {,.}*{.tmp}
ignore = Name {,.}*{.txt~}
ignore = Name {,.}*{.pl~}
ignore = Name {.unison.}*
ignore = Name {,.}*{.zip}

    # Use this command for displaying diffs
    diff = diff -y -W 79 --suppress-common-lines

    ignore = Name *~
    ignore = Name .*~
    ignore = Path */pilot/backup/Archive_*
    ignore = Name *.o

I looked at that, but I couldn't find a `unison` option that means "don't update file-last-modified dates". Is there one? Otherwise, this is a great answer to an entirely different problem. — Brooks Moses, Feb 02 '12 at 20:46
`-times` does that for me. Unison has a dry-run mode too, me thinks. — Marcos, Feb 02 '12 at 22:29
Well, setting `times=false` (or leaving off `-times`) would do that. I don't know how I missed that in the documentation before. Thanks! — Brooks Moses, Feb 02 '12 at 23:56
Glad to help. I'm a stickler when it comes to preserving things like modtimes, permissions and soft links. Often overlooked — Marcos, Feb 03 '12 at 16:44

score 1 · Answer 6 · answered Mar 20 '15 at 23:45

1

While rsync --checksum is the correct answer, note that this option is incompatible with --times, and that --archive includes --times, so if you want to rsync -a --checksum, you really need to rsync -a --no-times --checksum.

answered Mar 20 '15 at 23:45

Val Kornea

111
3

What do you mean by saying 'incompatible'? – o.v Jan 10 '19 at 12:44

Mario Palumbo · Answer 7 · 2022-02-04T00:07:28.593

This is the perfect answer: numbered backup file and copy only if source (also multiple, also directories) and destination file doesn't matches, with the possibility to choose to do backup recursively or not (with the depth level of your choice or unlimited) (default: not recursively), in any case the copy (no-clobber) is always recursive (unlimited depth):

export NULLGLOB="$(shopt -p nullglob)"

copy () {
    command cp -a --no-preserve=mode,ownership --remove-destination "$@"
    return $?
}

backup-and-copy () {
    $NULLGLOB
    local exit_code
    local error_message
    local i
    local maxdepth=0
    local submaxdepth=-1
    local abs
    for (( i=1; i<=$#; i++ )); do
        if [[ ${!i} == -- ]]; then
            command set -- "${@:1:i-1}" "${@:i+1}"
            break
        fi
        if [[ ${!i,,} == -r ]]; then
            command set -- "${@:1:i-1}" "${@:i+1}"
            if [[ ${!i} =~ ^-?[0-9]+$ ]]; then
                maxdepth=${!i}
                abs=${maxdepth/#-}
                if [[ ${#abs} -gt 9 ]]; then
                    echo "backup-and-copy: $maxdepth: Numerical result out of range=[-999999999, 999999999]"
                    return 1
                fi
                command set -- "${@:1:i-1}" "${@:i+1}"
            else
                maxdepth=-1
            fi
            i=$((i-1))
        fi
    done
    if [[ $maxdepth -gt 0 ]]; then submaxdepth=$((maxdepth-1)); fi
    error_message="$(cp -n -- "$@" 2>&1)"
    exit_code=$?
    if [[ $exit_code -ne 0 ]]; then
        echo "backup-and-copy: ${error_message}"
        return $exit_code
    fi
    if [[ -d ${!#} ]]; then
        for (( i=1; i<$#; i++ )); do
            if [[ -d ${!i} ]]; then
                if [[ $maxdepth -ne 0 ]]; then
                    shopt -s nullglob
                    backup-and-copy -R $submaxdepth -- "${!i}/"{,.[^.],..?}* "${!#}/$(command basename "${!i}")"
                fi
            else
                command cmp -s -- "${!i}" "${!#}/$(command basename "${!i}")" 2> /dev/null || cp --backup=numbered -- "${!i}" "${!#}"
            fi
        done
    else
        command cmp -s -- "$@" 2> /dev/null || cp --backup=numbered -- "$@"
    fi
    return $exit_code
}

How can I do a "copy if changed" operation?

7 Answers7

Linked