0

I've some notes of useful regular expressions and one that I always use is the following:

echo '/home/user/folder/file.txt' | sed -E 's/[\\\/][^\\\/]*$//g'

The result that I get from this regular expression is the path of the parent folder /home/user/folder. I understand the basics of regular expressions with:

\s          # all white space
\S          # no white space
.           # all chars       
\.          # period
+           # sequence of once or more
{5}         # sequence of delimited interval 
*           # sequence of zero or more
?           # sequence of once or none
[0-9]       # any sequence of number
[a-z]       # any sequence of letter 
[^x-y]      # no sequence of letter 
^           # beginning
$           # ending

However, I haven't managed to figure out what is the meaning of [\\\/] and [^\\\/] in the case of the regular expression from my example. How does it work?

raylight
  • 361
  • 1
  • 5
  • 13
  • Why don't you use what you understand and instead say `sed 's#\(.*\)/.*#\1#'` or `sed 's#/[^/]*$##'`, or use the `dirname` utility? – Kusalananda Apr 15 '22 at 05:51
  • @Kusalananda Why is `sed 's#\(.*\)/.*#\1#'` and `sed 's#/[^/]*$##'` better than `sed 's/[/][^/]*$//'`? I made this question because I was struggling to understand that regular expression. It wasn't so much about solving the problem of getting `dirname`. Multiple times I made a mess on `bash` code because I was using `dirname` of `dirname` of `dirname`... The code can get ugly like that. – raylight Apr 15 '22 at 06:15
  • 1
    Ah, then I misunderstood why you included the list of regular expression features that you understood. And I also misunderstood what the end goal was. I thought you wanted to strip off the last part of a pathname. Note that the given regular expression does not help with stripping off arbitrary components of the pathname either, just like `dirname` does not help with that. In the `zsh` shell, it would be a simple matter of using `$pathname:h`, `$pathname:h:h` etc. – Kusalananda Apr 15 '22 at 06:30
  • @Kusalananda The pattern `[\\\/]` was out of it... I missed the concept that `[abc]` is `a` or `b` or `c` as explained in the accepted answer... So I wasn't understanding that it was just \ or / in the end. – raylight Apr 15 '22 at 06:33

1 Answers1

4

[\\\/] contains an escaped \ and an escaped / (escaping this character is not necessary here). Like [abc] matches a or b or c, [\\\/] matches \ or /.

[^\\\/] is somewhat similar but ^ is special at the beginning of []: it negates the meaning. [^\\\/] matches any character other than \ or /.

[\\\/][^\\\/]*$ matches \ or /, then zero or more other characters till the end of the line. Your s command replaces the matched string with nothing. The whole sed command removes the last \ or the last / (whichever occurs later in the line) along with everything that follows in the line.

Notes:

  • -E is not needed for this particular command to work.
  • g is not needed (you cannot find more than one end of the line in a line).
  • (already noted) Escaping / inside [] is not needed. (Escaping / outside of [] is not needed in general; it's often needed because people particularly choose / as the delimiter in s/…/…/, but it can be another character, e.g. s|…|…|.)
  • Your command seems to be "universal" in a sense it removes the last component from Unix pathnames (components separated by /) and from DOS/Windows pathnames (components separated by \). But…
  • \ may appear in a Unix pathname. If it does then your sed command may give you an unexpected result. A newline character is also allowed.
  • / is a valid pathname and its parent directory is /. Your sed command yields an empty string though.
  • If dir is a directory then /path/to/dir/ is equivalent to /path/to/dir, but your sed command will yield /path/to/dir and /path/to respectively.
Kamil Maciorowski
  • 19,242
  • 1
  • 50
  • 94
  • Cool, I actually didn't know that `[abc]` would make `a` or `b` or `c`... Most of the time I use ranges like `[a-z]` and `[0-9]`... So in the end the \\ is more about windows users and `sed -E 's/[\/][^\/]*$//g'` would be enough to make it work on Linux... With the limitations that you mentioned though. Thanks! – raylight Apr 15 '22 at 05:19
  • 1
    @raylight Even `sed 's/[/][^/]*$//'` should do. – Kamil Maciorowski Apr 15 '22 at 05:21
  • I see... I'd be more like the case in `sed 's/\/[^/]*\/[^/]*$//g'`. Going back two folders now... That ends up being better than using `dirname` multiple times in my opinion... – raylight Apr 15 '22 at 05:39
  • 1
    If you want to mangle Unix pathnames, consider basename(1), dirname(1), or perhaps the builtins to your shell. – vonbrand Apr 15 '22 at 19:17