7

In a bash or zsh script, how might I extract the host from a url, e.g. unix.stackexchange.com from http://unix.stackexchange.com/questions/ask, if the latter is in an environment variable?

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
Toothrot
  • 3,255
  • 3
  • 24
  • 47

4 Answers4

10

You can use parameter expansion, which is available in any POSIX compliant shell.

$ export FOO=http://unix.stackexchange.com/questions/ask
$ tmp="${FOO#*//}" # remove http://
$ echo "${tmp%%/*}" # remove everything after the first /
unix.stackexchange.com

A more reliable, but uglier method would be to use an actual URL parser. Here is an example for python:

$ python3 -c 'import sys; from urllib.parse import urlparse; print(urlparse(sys.argv[1]).netloc)' "$FOO"
unix.stackexchange.com
David Foerster
  • 1,505
  • 1
  • 11
  • 18
jordanm
  • 41,988
  • 9
  • 116
  • 113
5

If the URLs all follow this pattern I have this short and ugly hack for you:

echo "$FOO" | cut -d / -f 3
David Foerster
  • 1,505
  • 1
  • 11
  • 18
3

You can do it many ways, some of them being:

export _URL='http://unix.stackexchange.com/questions/ask'

echo "$_URL" | sed -ne 'y|/|\n|;s/.*\n\n/;P'

expr "$_URL" : 'http://\([^/]*\)'

echo "$_URL" |  perl -lpe '($_) = m|^http://\K[^/]+|g'

perl -le 'print+(split m{/}, $ENV{_URL})[2]'

(set -f; IFS=/; set -- $_URL; echo "$3";)
Rakesh Sharma
  • 770
  • 4
  • 4
  • Nice alternatives. +1. Though the sed solution has a small mistake; one slash is missing. should be `echo "$_URL" | sed -ne 'y|/|\n|;s/.*\n\n//;P'` or even better `echo "$_URL" | sed -ne 'y|/|\n|;s|.*\n\n||;P'` – George Vasiliou Feb 27 '17 at 23:27
2

Can be done also with regex groups:

$ a="http://unix.stackexchange.com/questions/ask"
$ perl -pe 's|(.*//)(.*?)(/.*)|\2|' <<<"$a"
unix.stackexchange.com
George Vasiliou
  • 7,803
  • 3
  • 18
  • 42