I'm trying to transfer files on my NAS, but I will get this error "The name of a file or a folder within an encrypted shared folder cannot exceed 143 English characters or 47 Asian (CJK) characters" is there a command in the shell to find every file that meets that?
-
Is it 130 (as per subject) or 143 (as per body)? – Stéphane Chazelas Sep 07 '21 at 05:37
-
Please [do not cross-post](https://meta.stackexchange.com/a/64069/355310). For the record, another copy is [on Super User](https://superuser.com/q/1674690/432690). – Kamil Maciorowski Sep 07 '21 at 05:44
2 Answers
find path | grep -P '\/[^\/]{130,}[^\/]$'
Based on this source on stackoverflow.com: Find files that are too long for Synology encrypted shares
I added the $ at the end to just capture files and not folders.
Maybe you can find the CJK characters with a range of unicode. I don't think grep can do this, maybe ugrep.
- 549
- 3
- 5
-
That assumes file paths don't contain newline characters. That also assumes you're in the C locale. Otherwise `grep`'s `[^/]` would match on characters as defined in the user's locale, not necessarily bytes. Also note that `/` should not be escaped (with `-P`, those backslash are thankfully harmless, but they wouldn't without). – Stéphane Chazelas Sep 07 '21 at 16:01
I think they're trying to say that file names, when UTF-8 encoded, cannot contain more than 143 bytes on the assumption that many Asian characters (whatever they meant) are encoded on 3 bytes in UTF-8 (you'll notice that 48 x 3 is 144) and most English characters are encoded on one byte¹.
So to find those:
In zsh:
set +o multibyte -o extendedglob
print -rC1 -- **/?(#c144,)(ND)
For those that are above the limit and
set +o multibyte -o extendedglob
print -rC1 -- **/?(#c1,143)(ND)
For those that are under the limit.
To see more easily the ones that are of type directory, you can add the M (for Mark) glob qualifier (print -rC1 -- **/?(#c144,)(NDM)), which will append a / to directories.
Or with find:
LC_ALL=C find . -name '????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*'
For those above the limit.
That's 144 ?s above; you can also construct the pattern with:
pattern=$(printf %145s '*' | tr ' ' '?')
LC_ALL=C find . ! -name "$pattern"
Or in zsh:
pattern=${(l[145][?]):-*}
Replace -name with ! -name to get those filenames that are under the limit (are made of 143 bytes or less).
With the GNU implementation of find, you can also do:
LC_ALL=C find . -regextype posix-extended -regex '.*/[^/]{144,}'
¹ The UTF-8 encoding encodes characters on 1 to 4 bytes (initially the algorithm was designed to encode code points up to U+7FFFFFF on up to 6 bytes, but Unicode code points have been later restricted to U+10FFFF), only the characters from the US-ASCII set (codepoints U+0000 to U+007F) are encoded on one byte. The ones encoded on 3 bytes are the ones from U+0800 to U+FFFF.
- 522,931
- 91
- 1,010
- 1,501