How to get last part of http link in Bash?

Question

I have an http link :

http://www.test.com/abc/def/efg/file.jar

and I want to save the last part file.jar to variable, so the output string is "file.jar".

Condition: link can has different length e.g.:

http://www.test.com/abc/def/file.jar.

I tried it that way:

awk -F'/' '{print $7}'

, but problem is the length of URL, so I need a command which can be used for any URL length.

DopeGhoti · Accepted Answer · 2016-11-23T16:07:25.327

75

Using awk for this would work, but it's kind of deer hunting with a howitzer. If you already have your URL bare, it's pretty simple to do what you want if you put it into a shell variable and use bash's built-in parameter substitution:

$ myurl='http://www.example.com/long/path/to/example/file.ext'
$ echo ${myurl##*/}
file.ext

The way this works is by removing a prefix that greedily matches '*/', which is what the ## operator does:

${haystack##needle} # removes any matching 'needle' from the
                    # beginning of the variable 'haystack'

edited Nov 23 '16 at 16:07

answered Nov 23 '16 at 15:59

DopeGhoti

73,792
8
97
133

1

Any sort of explanation to go with that? – Questionmark Nov 23 '16 at 16:01
Sure. Will that do? – DopeGhoti Nov 23 '16 at 16:07
That is great :) – Questionmark Nov 23 '16 at 16:09
2

If you want to strip query strings, you can first assign to an intermediate variable e.g. `file=${myurl##*/}`, then use greedy reverse-matching to back up to the `?` (don't forget to escape it!), e.g. `echo ${file%%\?*}` – Doktor J Nov 24 '16 at 17:39

score 31 · Answer 2 · answered Nov 23 '16 at 16:23

31

basename and dirname work good for URLs too:

> url="http://www.test.com/abc/def/efg/file.jar"
> basename "$url"; basename -s .jar "$url"; dirname "$url"
file.jar
file
http://www.test.com/abc/def/efg

answered Nov 23 '16 at 16:23

Fedor Dikarev

1,761
8
13

1

+1 Brilliant, it works because an URL and a PATH and both URIs. – Tulains Córdova Nov 24 '16 at 14:16
1

@TulainsCórdova a path isn't a [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier); this works because `basename` and `dirname` split strings on /, and that happens to work with URLs too, at least as long as they don't have a local portion (not with URIs in general though). – Stephen Kitt Nov 24 '16 at 14:41
In the Wikipedia article about URIs, they give the following as valid examples of URI references: `/relative/URI/with/absolute/path/to/resource.txt`, `relative/path/to/resource.txt`, `../../../resource.txt` and `resource.txt` https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples_of_URI_references – Tulains Córdova Nov 24 '16 at 14:46
1

@TulainsCórdova Wikipedia is not wrong, `/relative/path` can be either a file system path or a relative URI. But which of those it is depends on the context. When it's used as a file system path, it's not a URI. When it's used as a URI, it's not a file system path. Saying it's a URI just because it happens to match the syntax is like saying each of the words in this comment is a URI as well. – hvd Nov 25 '16 at 08:28

score 14 · Answer 3 · answered Nov 23 '16 at 15:58

14

With awk, you can use $NF, to get the last field, regardless of number of fields:

awk -F / '{print $NF}'

If you store that string in shell variable, you can use:

a=http://www.test.com/abc/def/efg/file.jar
printf '%s\n' "${a##*/}"

answered Nov 23 '16 at 15:58

cuonglm

150,973
38
327
406

score 6 · Answer 4 · answered Nov 23 '16 at 20:08

Most of the posted answers are not robust on URLs that contain query strings or targets, such as, for example, the following:

https://example.com/this/is/a/path?query#target

Python has URL parsing in its standard library; it's easier to let it do it. E.g.,

from urllib import parse
import sys
path = parse.urlparse(sys.stdin.read().strip()).path
print("/" if not path or path == "/" else path.rsplit("/", 1)[-1])

You can compact that into a single python3 -c for use in a shell script:

echo 'https://example.com/this/is/a/path/componets?query#target' \
    | python3 -c 'from urllib import parse; import sys; path = parse.urlparse(sys.stdin.read().strip()).path; print("/" if not path or path == "/" else path.rsplit("/", 1)[-1])'

(You can also keep the script broken out, too, for readability. ' will let you put newlines in.)

Of course, now your shell script has a dependency on Python.

(I'm a little unsure about the if that tries to handle cases where the URL's path component is the root (/); adjust/test if that matters to you.)

score 2 · Answer 5 · edited Nov 30 '16 at 05:52

2

One method is to rev the URL then cut the field and then rev again. eg:

echo 'http://www.test.com/abc/def/efg/file.jar ' | rev | cut -d '/' -f 1 | rev

Output:

file.jar

Example 2:

echo 'http://www.test.com/abc/cscsc/sccsc/def/efg/file.jar ' | rev | cut -d '/' -f 1 | rev

Output:

file.jar

edited Nov 30 '16 at 05:52

answered Nov 30 '16 at 05:28

Nived Karimpunkara

121
2

How to get last part of http link in Bash?

5 Answers5