I have a file containing list of URLs (one entry in one line).
After processing it to extract the host- (server-)names
with the script below (which works correctly),
the host names that appeared multiple times in the input
were appearing multiple times in the displayed output.
I want each name to appear only once.
I tried uniq and sort -u, but they didn't help.
Below is the code I had used to extract the hosts:
function extract_parts {
if [ -f "wget-list" ]; then
while read a; do
a=${a:8}
host=$(echo -e "$a" | awk -F '/' '{print $1}' | sort -u)
# host=$(echo -e "$a" | awk -F '/' '{print $1}' | uniq -iu)
echo -e ${host}
done <<< $(cat ./wget-list)
fi
}
where the wget-list contains (as a truncated example):
https://downloads.sourceforge.net/tcl/tcl8.6.12-html.tar.gz
https://downloads.sourceforge.net/tcl/tcl8.6.12-src.tar.gz
https://files.pythonhosted.org/packages/source/J/Jinja2/Jinja2-3.1.2.tar.gz
https://files.pythonhosted.org/packages/source/M/MarkupSafe/MarkupSafe-2.1.1.tar.gz
https://ftp.gnu.org/gnu/autoconf/autoconf-2.71.tar.xz
https://ftp.gnu.org/gnu/automake/automake-1.16.5.tar.xz
Result after the script
(only the hosts, without the https:// and path parts):
downloads.sourceforge.net
downloads.sourceforge.net
files.pythonhosted.org
files.pythonhosted.org
ftp.gnu.org
ftp.gnu.org
Desired output (the above, but with no duplicates):
downloads.sourceforge.net
files.pythonhosted.org
ftp.gnu.org