Syntactic differences in cp -r and how to overcome them

Question

Let's say we are in a blank directory. Then, the following commands:

mkdir dir1
cp -r dir1 dir2

Yield two (blank) directories, dir1 and dir2, where dir2 has been created as a copy of dir1. However, if we do this:

mkdir dir1
mkdir dir2
cp -r dir1 dir2

Then we instead find that dir1 has now been put inside dir2. This means that the exact same cp command behaves differently depending on whether the destination directory exists. If it does, then the cp command is doing the same as this:

mkdir dir1
mkdir dir2
cp -r dir1 dir2/.

This seems extremely counter-intuitive to me. I would have expected that cp -r dir1 dir2 (when dir2 already exists) would remove the existing dir2 (and any contents) and replace it with dir1, since this is the behavior when cp is used for two files. I understand that recursive copies are themselves a bit different because of how directories exist in Linux (and more broadly in Unix-like systems), but I'm looking for some more explanation on why this behavior was chosen. Bonus points if you can point me to a way to ensure cp behaves as I had expected (without having to, say, test for and remove the destination directory beforehand). I tried a few cp options without any luck. And I suppose I'll accept rsync solutions for the sake of others that happen upon this question who don't know that command.

In case this behavior is not universal, I'm on CentOS, using bash.

What is counter intuitive about that? If shoot an arrow at a tree something happens, if I shoot the same arrow in the same direction but someone is standing in front of the tree something else happens. Same program different data, different outcome. — Anthon, Dec 08 '14 at 18:50
"I would have expected that cp -r dir1 dir2 (when dir2 already exists) would remove the existing dir2 (and any contents).." **What?** Why? I can understand overwritting files, but removing any pre-existing files as well? O.o — muru, Dec 08 '14 at 18:52
@anthon But I haven't provided different inputs. In your example, you're shooting an arrow _in a constant direction_, not at a tree. @muru I wouldn't expect `cp file1 file2` to append if `file2` exists, I expect it to overwrite. My basis for anticipated behavior is on a literal interpretation of the syntax and on what is done with files, though other users may expect differently. — TTT, Dec 08 '14 at 19:29
Could the down-voter please provide some feedback on how I could improve the question? — TTT, Dec 08 '14 at 19:49
@TTT you can only notify one user in a comment. By that logic, since `> file` truncates a file, shouldn't `>directory` be equivalent to `rm -r directory; mkdir directory`? — muru, Dec 08 '14 at 20:04
@muru thanks. That's a tricky one, since `>` redirection is placing contents and not files/directories themselves, I wouldn't expect that behavior to be possible (it isn't) since you'd be writing data directly to a directory, rather than a file in a directory. My idea of `cp -r` overwriting isn't that I think that behavior is _better_, but just that it's consistent. I'd be equally content if `cp file1 file2` and `cp -r dir1 dir2` both appended. But, neither do. Instead, one (over)writes while the other's behavior depends on the situation. — TTT, Dec 08 '14 at 20:28
@TTT The problem is directories and files are treated differently in enough commands (e.g., `ls` by default, `rm`, `touch` when given a non-existent directory as argument, etc.) that that argument doesn't hold water. — muru, Dec 08 '14 at 20:33
Similar question I asked a bit later but got more visibility: https://unix.stackexchange.com/q/228597 — jakub.g, Jan 16 '23 at 19:54

muru · Accepted Answer · 2016-04-06T22:45:32.790

The behaviour you're looking for is a special case:

cp -R [-H|-L|-P] [-fip] source_file... target
[This] form is denoted by two or more operands where the -R option is specified. The cp utility shall copy each file in the file hierarchy rooted in each source_file to a destination path named as follows:

If target exists and names an existing directory, the name of the corresponding destination path for each file in the file hierarchy shall be the concatenation of target, a single <slash> character if target did not end in a <slash>, and the pathname of the file relative to the directory containing source_file.

If target does not exist and two operands are specified, the name of the corresponding destination path for source_file shall be target; the name of the corresponding destination path for all other files in the file hierarchy shall be the concatenation of target, a <slash> character, and the pathname of the file relative to source_file.

It shall be an error if target does not exist and more than two operands are specified ...

Therefore I'd say it's not possible to make cp do what you want.

Since your expected behaviour is "cp -r dir1 dir2 (when dir2 already exists) would remove the existing dir2 (and any contents) and replace it with dir1":

rm -rf dir2 && cp -r dir1 dir2

You don't even need to check if dir2 exists.

The rsync solution would be adding a trailing / to the source so that it doesn't copy dir1 itself into dir2 but copies the content of dir1 to dir2 (it will still keep existing files in dir2):

$ tree dir*
dir1
└── test.txt
dir2
└── test2.txt

0 directories, 2 file
$ rsync -a dir1/ dir2
$ tree dir*           
dir1
└── test.txt
dir2
└── test.txt
└── test2.txt

0 directories, 3 files
$ rm -r dir2          
$ rsync -a dir1/ dir2
$ tree dir*           
dir1
└── test.txt
dir2
└── test.txt

0 directories, 2 files

This answer, combined with your comments on inconsistent behaviors between directories and files for other commands answers my question in full. — TTT, Dec 08 '14 at 20:51

Syntactic differences in cp -r and how to overcome them

1 Answers1