14

I'm running Mac OSX and trying to use the command line to find the number of files I have with the same name.

I tried to use the following command:

find ~ -type f -name "*" -print | basename | sort | uniq -d > duplicate_files

It doesn't work! When I do the following:

find ~ -type f -name "*" -print > duplicate_files

Then duplicate_files does contain the paths of all my files. So I think the issue is with basename - it doesn't accept standard input. I then tried the following:

basename $(find ~ -type f -name "*" -print) > duplicate_files

but again that doesn't seem to work. Search on the internet doesn't seem to yield much joy. Any thoughts most welcome.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
JohnB
  • 153
  • 1
  • 1
  • 6

6 Answers6

20

basename operates on its command line argument, it doesn't read from standard input.

You don't need to call the basename utility, and you'd better not: all it would do is strip off the part before the last /, and it would be slow to call an external command for each entry, you can use a text processing utility instead.

find ~ -type f | sed 's!.*/!!' | sort | uniq -d

It may be more useful to keep track of the location of the files. Sorting by name makes it easier to locate duplicates, but sort doesn't have an option to use the last field. What you can do is copy the last /-separated field to the beginning, then sort, and then use a bit of ad hoc awk processing to extract and present the duplicates.

find ~ -type f |
sed 's!.*/\(.*\)!\1/&!' |   # copy the last field to the beginning
sort -t/ -k1,1 |
cut -d/ -f2- |   # remove the extra first field (could be combined with awk below)
awk -F / '{
    if ($NF == name) {
        if (previous != "") {print previous; previous = ""}
        print
    } else {
        previous = $0
        name = $NF
    }
'

(Note that I assume that none of your file names contain newline characters.)

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
9

Why not to use builtin find features to output just filename:

find ~ -type f -printf '%f\n' | sort | uniq -c

(assumes GNU find) or at least something like this:

find ~ -exec basename {} \; | sort | uniq -c

basename can't read via pipe or process multiple files at once.

ps. There is no need to specify -name '*' if you want to list all the files. This is a default option.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
rush
  • 27,055
  • 7
  • 87
  • 112
  • 1
    Thanks -- '-printf' doesn't work for OS X UNIX – JohnB Mar 09 '14 at 12:12
  • And when I try the second version I get `basename: unknown primary or operator`. Thanks for the tip on `-name "*"` – JohnB Mar 09 '14 at 12:19
  • That's strange. I can see `-printf` even in posix man page. About the error with second way, it's cause of typo in my answer. Fixed. Could you please try it one more time? – rush Mar 09 '14 at 12:23
  • Also with `-printf` I get the `-printf: unknown primary or operator`. Also when I checked the Unix in a Nutshell reference book it lists as a GNU/Linux option - doesn't say anything about OSX – JohnB Mar 09 '14 at 12:36
  • 1
    Actually the best source would be `man find` in your console :) – rush Mar 09 '14 at 12:41
5

This seems to work for me on OSX:

find ~ -type f -exec basename -a {} + | sort | uniq -d
rahmu
  • 19,673
  • 28
  • 87
  • 128
  • 1
    Yes - this is great thanks -- out of interest what does the `+` signify in the command? – JohnB Mar 09 '14 at 12:31
  • 2
    Is this is useful please consider up-voting it. – suspectus Mar 09 '14 at 13:52
  • It is -- I cannot vote up beacuase i need 15 reputation :-( – JohnB Mar 09 '14 at 15:09
  • @StephaneChazelas: According to the [man page for BSD basename](http://www.freebsd.org/cgi/man.cgi?query=basename), the executable can take multiple strings as arguments. I double checked on OSX, it works. – rahmu Mar 10 '14 at 11:31
  • 1
    All right sorry, I stand corrected. I wasn't aware of that BSD extension. However, that still fails if there are exactly two files. You'd need to add the [`-a` option](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/basename.1.html) to cover for that case as well. – Stéphane Chazelas Mar 10 '14 at 12:04
  • @StephaneChazelas: Good point. I updated the answer to reflect that. – rahmu Mar 10 '14 at 13:27
2

Alternatives (assumes no newline in file names):

find ~ -type f | awk -F/ '{print $NF}' | sort | uniq -d
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
2

You can use xargs with basename to get the desired output, like this:

find ~ -type f -name "*" -print | xargs -l basename | sort | uniq -d > duplicate_files
Seff
  • 338
  • 3
  • 6
0

With a recent version of bash that handles associative arrays, the following would additionally handle pathnames with embedded newlines:

#!/bin/bash

topdir=$HOME

shopt -s globstar  # enable the ** glob

declare -A count

# count the number of times each filename (base name) occurs
for pathname in "$topdir"/**; do
    # skip names that are not regular files (or not symbolic links to such files)
    [ ! -f "$pathname" ] && continue

    # get the base name
    filename=${pathname##*/}

    # add one to this base name's count
    count[$filename]=$(( ${count[$filename]} + 1 ))
done

# go through the collected names and print any name that
# has a count greater than one
for filename in "${!count[@]}"; do
    if [ "${count[$filename]}" -gt 1 ]; then
        printf 'Duplicate filename: %s\n' "$filename"
    fi
done

This uses no external utility.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936