0

How can I know how many bytes does it weight the name of a filename? Just the file, not the full path. I've tried this:

echo 'filename.extension' | wc -c

is this right?

Smeterlink
  • 195
  • 1
  • 9

2 Answers2

5

Looks fine; echo will however add a trailing newline by default, so echo -n or printf are your friend. If you want to convert /path/to/files/like/this/filename.extension to filename.extension, you'd have something like

filepath='/path/to/files/like/this/filename.extension'
namelength=$(printf "%s" "$(basename "${filepath}")" | wc -c)

If you want character (or something similar) length, Not byte count:
There's a much easier way in POSIX-compatible shells (like bash and zsh, so you're probably using one!):

filename="${filepath##*/}"
namelength=${#filename}

The ${#varname} expansion directly gives you the length of the variable.

Marcus Müller
  • 21,602
  • 2
  • 39
  • 54
  • Does `${#var}` always return bytes? Here in bash 5.1.12, with a UTF-8 locale active, it seems to count characters and not raw UTF-8 bytes, e.g. `var="ąčęėįšųūž"; echo ${#var}` returns 9 while `printf %s "$var" | wc -c` is 18. (I also recall this having changed from 'bytes' to 'characters' somewhere in a minor 5.0.x release and breaking several of my scripts...) – u1686_grawity Jan 09 '22 at 10:49
  • @user1686 you raise an interesting point. I was *assuming* bytes... hm, but my last name in zsh and bash agree with that it's glyphs (maybe?!). – Marcus Müller Jan 09 '22 at 11:35
  • 1
    @MarcusMüller, codepoints, most likely. With `var=$'a\xcc\x88\xc3\xa4'`, `${#var}` gives 3. That's the two ways to represent the letter ä, so three code points: letter a, combining diaeresis (U+0308), and the a with diaeresis in a single code point (U+00E4). In the C locale, it should give you the number of bytes, e.g. `(LC_CTYPE=C; echo ${#var})` prints 5. – ilkkachu Jan 09 '22 at 12:10
  • @user1686, oh, and I was on Bash 4.4 still, and it counts code points here. Not sure if some of the 5.0.x has had a slip there, but locales would definitely affect it – ilkkachu Jan 09 '22 at 12:14
  • 1
    @ilkkachu: Aha, I went through my Git logs and looks like the actual change wasn't about ${#var} itself, but rather about READLINE_POINT in 5.0 _expecting_ offsets in codepoints rather than bytes. – u1686_grawity Jan 09 '22 at 12:19
  • @ilkkachu I *wish* it was that easy, see [my new question about exactly this](https://unix.stackexchange.com/questions/685644/what-is-length-of-a-string-in-bourne-shell-compatibles-string) – Marcus Müller Jan 09 '22 at 12:37
0

You're not testing a file name, just a string, but what about this, a quick and dirty hack...

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char **argv) {
            int fcount = 1;
            int ret = 0;
            struct stat fstat_details;
    
            while (fcount != argc) {
                    ret = stat (argv[fcount], &fstat_details);
    
                    if (ret == 0) {
                            printf ("file: %s, length: %lu\n", argv[fcount], strlen(argv[fcount]));
                    } else {
                            printf ("file %s not found\n", argv[fcount]);
                    }
                    fcount++;
            }
            return(0);
    }
Bib
  • 2,056
  • 1
  • 4
  • 10
  • When posting C code, it'd be a good idea to post the complete source code, so people don't need to hunt for the header where `struct stat` is defined in. Not that I'm sure what you'd need `stat()` for anyway, if all you you need for getting the string length is to call `strlen()` (which also needs to be declared). – ilkkachu Jan 09 '22 at 12:16
  • 1
    How is this C code _quick and dirty hack_? – Arkadiusz Drabczyk Jan 09 '22 at 12:18
  • @ilkkachu, Sorry, misplaced cut'n'paste. The reason for stat, is because the op was concerned about a file name. I assume this file already exists, and this is just a check that they have it correct. Further, I cannot assume what is on the disk, would be the same length as internally in the environment. Everything given is only concerned with vars. At Arkadiusz Drabczyk, because is took me a few minutes and could be made more complete. – Bib Jan 09 '22 at 13:53
  • @Bib, yes, the `stat()` could be used to check if such a file exists. But what they're doing in the example in the question doesn't check for that, and a _filename_ can be a valid filename with particular length even without existing. But you're not just checking if the filename is valid, but also if it exists. If testing for validity is the point, you might do well to elaborate on that, and on the systems where that's relevant. (apart from obvious cases like the zero-length string anyway.) – ilkkachu Jan 09 '22 at 14:55
  • In any case, I can't see the point about the filename being of a different length on disk: you're getting the length of the same string that was passed as an argument to the program, the same the user entered. How could it change length just so before `strlen()` is called? – ilkkachu Jan 09 '22 at 14:55
  • @Bib Perfect it works. I see it does not count the latest `0x0a` which I guess is in every filename... Is this right and should be counted or not against filename byte size? – Smeterlink Jan 10 '22 at 03:06
  • @Smeterlink, like the other answer says, it's `echo` that adds the newline when you do `echo 'filename.extension' | wc -c`. Most filenames don't have newlines in them. – ilkkachu Jan 10 '22 at 06:41
  • @ilkkachu then how do filesystems know what's the lenght of the filename? – Smeterlink Jan 10 '22 at 08:57
  • 1
    @Smeterlink, filenames are passed as NUL-terminated strings through the system calls, that is, there's a byte with the numerical value zero as a terminator, not a newline. What the filesystem uses internally might be different, and they might just mark the length separately. – ilkkachu Jan 10 '22 at 11:19
  • @ilkkachu I had the idea filenames were stored in filesystems with NUL termination but probably depends on filesystem. – Smeterlink Jan 10 '22 at 17:05
  • 1
    @Smeterlink, maybe they are. Maybe they aren't. There's no way to know from userspace, and no reason to care. (Well, unless you're implementing `fsck` or `debugfs`.) FWIW, the ext2/3/4 structure for directory entries has an explicit field for the name length (`name_len` in `struct ext4_dir_entry` https://www.kernel.org/doc/html/latest/filesystems/ext4/dynamic.html#directory-entries) – ilkkachu Jan 10 '22 at 21:19