7

I want to test if a relative symlink points within the subtree of a certain directory.

This example would yield false since it points outside the foo directory:
/foo>readlink bar
../fie.txt

While this example would yield true:
/foo>readlink bar
fum/fie.txt

Is there an existing utility I can leverage or will I have to code it from scratch? I'm using bash.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Fylke
  • 245
  • 3
  • 7
  • What if `fie.txt` or `fum` is itself a symlink outside `foo`? – Stéphane Chazelas Jan 08 '14 at 12:12
  • Would you always run this from `/foo` or do you need to be able to pass arbitrary directories? I mean, is the question always with respect to `./` or not? – terdon Jan 08 '14 at 12:23
  • @StephaneChazelas Yeah, that's a problem that I choose to ignore, sort of at least. I'm thinking I'll expand the link using readlink -f and see if the prefixes match. But I will ignore crazy corner cases since they don't exist in our environment. – Fylke Jan 08 '14 at 14:18
  • @terdon No, it should accept arbitrary directories. – Fylke Jan 08 '14 at 14:21
  • Have a look at [symlinks](http://www.linuxcommand.org/man_pages/symlinks8.html), it _may_ help. – Martin Schröder Jan 15 '14 at 12:28

3 Answers3

5

I don't think there's such an utility. With GNU readlink, you could do something like:

is_in() (
  needle=$(readlink -ve -- "$1" && echo .) || exit
  haystack=$(readlink -ve -- "$2" && echo .) || exit
  needle=${needle%??} haystack=${haystack%??}
  haystack=${haystack%/} needle=${needle%/}
  case $needle in
    ("$haystack" | "$haystack"/*) true;;
    (*) false;;
  esac
)

That resolves all symlinks to end up with a canonical absolute path for both needle and haystack.

Explanation

  • We get the canonical absolute path of both the needle and the haystack. We use -e instead of -f as we want to make sure the files exist. The -v option gives an error message if the files can't be accessed.

  • As always, -- should be used to mark the end of options and quoting as we don't want to invoke the split+glob operator here.

  • Command substitution in Bourne-like shells have a misfeature in that it removes all the newline character from the end the output of a command, not just the one added by commands to end the last line. What that means is that for a file like /foo<LF><LF>, $(readlink -ve -- "$1") would return /foo. The common work-around for that is to append a non-LF character (here .) and strip that and the extra LF character added by readlink with var=${var%??} (remove the last two characters).

  • The needle is regarded as being in the haystack if it is the haystack or if it is haystack/something. However, that wouldn't work if the haystack was / (/etc for instance is not //something). / often needs to be treated specially because while / and /xx have the same number of slashes, one is a level above the other.

    One way to address it is to replace / with the empty string which is done with var=${var%/} (the only path ending with / that readlink -e may output is /, so removing a trailing / is changing / to the empty string).

For the canonizing of the file paths, you could use a helper function.

canonicalize_path() {
  # canonicalize paths stored in supplied variables. `/` is returned as 
  # the empty string.
  for _var do
    eval '
      '"$_var"'=$(readlink -ve -- "${'"$_var"'}" && echo .) &&
      '"$_var"'=${'"$_var"'%??} &&
      '"$_var"'=${'"$_var"'%/}' || return
  done
}

is_in() (
  needle=$1 haystack=$2
  canonicalize_path needle haystack || exit
  case $needle in
    ("$haystack" | "$haystack"/*) true;;
    (*) false;;
  esac
)
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • I found studying this post very instructive. May I ask you a couple of questions? Is there any significance in the fact that in `needle=${needle%??} haystack=${haystack%??}` the `needle` variable is dealt with first, whereas in the next line it is the other way around? Also, how come your `return` statements don't explicitly return a non-zero value (to indicate error)? Last one: would it make sense to factor out the entire transformation (the call to `readlink`, plus the two suffix truncations) to a separate `_canonicalize_path` helper function? – kjo Feb 22 '16 at 12:28
  • 1
    @kjo, 1) no significance 2) `return` returns by default with the status of the last command. With `|| return`, that allows to return the status as provided by the failing application. 3) sure, but the resulting function will likely not be a pleasant sight. I'll add an example. – Stéphane Chazelas Feb 22 '16 at 12:48
  • Thanks! I see what you mean! Not a pleasant sight at all. Shell programming must be the hardest type of programming I know of... – kjo Feb 22 '16 at 13:11
0

I solved the problem like this:

echo $abs_link_target | grep -qe "^$containing_dir"

The $abs_link_target variable contains the absolut path to the symlink target (expanded through readlink -f). I then check to see if the beginning of the target path matches the beginning of the $containing_dir

Fylke
  • 245
  • 3
  • 7
  • It would says that /foobar is in /foo – Stéphane Chazelas Jan 09 '14 at 08:28
  • It would say that /abc/d is in /a.b (`$containing_dir` taken as a regexp) – Stéphane Chazelas Jan 09 '14 at 08:28
  • It would say that /abc is in `/foo/abc` (the lines of the pattern string are treated as different patterns to match) – Stéphane Chazelas Jan 09 '14 at 08:30
  • It would say that `/foo/abc` is in `/abc` (`grep` matches on each line of the input, not the whole input so generally can't be used to match file names). – Stéphane Chazelas Jan 09 '14 at 08:55
  • Depending on the `echo` implementation and/or the environment, you'll have issues with filenames containing backslashes [`echo` should really not be used to handle arbitrary data](http://unix.stackexchange.com/a/65819/22565) – Stéphane Chazelas Jan 09 '14 at 08:56
  • @StephaneChazelas Okay, scenario 3 and 4 is really stretching it, having in a directory/filename is just silly. Scenario 1 is very valid, but should be solvable if you make sure that the `$containing_dir` ends with a `/`. Scenario 2 I don't really understand, do you actually mean /a.c, and that the . is a wildcard? – Fylke Jan 09 '14 at 09:33
  • yes sorry, should have been `/a.c`, see my answer for a more correct solution – Stéphane Chazelas Jan 09 '14 at 10:01
  • Almost forgot: you'll have problems with filenames containing blank or wildcard characters as well since you're not quoting `$abs_link_target` – Stéphane Chazelas Jan 09 '14 at 10:16
  • @StephaneChazelas Ah okay, I think I'll still go with the current solution as it is much easier to realize what's going on in it. Mostly because the people who will maintain it aren't super well versed in bash. And also because the domain where this script is used won't have wildcards in the filenames. Your answer is probably much more robust though. – Fylke Jan 09 '14 at 11:27
0

grep -q "^/foo/bar/" <<< "$(readlink -f "anyfile.ext")"

fr00tyl00p
  • 121
  • 4
  • 1
    Assuming the target of `anyfile.ext` exists and is reachable (otherwise, `readlink -f` as opposed to `readlink -e` might not give you the correct path) and that the resulting path doesn't contain newline characters (assumes zsh or bash4 or ksh93m+ or above). Note that if `anyfile.ext` points to `/foo/bar` itself, it will say it's not within. – Stéphane Chazelas Jan 10 '14 at 11:56