How to select and output URL addresses out of any file?

Question

I would like to know what command would:

select all URL in a file (i.e. recognize all addresses beginning with http or www from the beginning to the end and separate them from text or other data)
output them in a .txt file.

The idea is to perform next a wget -i on the .txt file. I need to properly select and output these URL in a .txt file as wget struggles to directly identify all URL in the raw file.

See http://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file — Zwans, Jan 06 '17 at 09:48

score 1 · Answer 1 · edited Aug 16 '20 at 12:56

1

I followed instructions in How to use grep and cut in script to obtain website URLs from an HTML file and it worked perfectly in my case, as URL are between < href > in the input file:

grep -Po '(?<=href=")[^"]*(?=")' INPUT_FILE > OUTPUT_FILE.txt

edited Aug 16 '20 at 12:56

Jeff Schaller

66,199
35
114
250

answered Jan 06 '17 at 22:36

ivako

31
1
5

How to select and output URL addresses out of any file?

1 Answers1