0

I would like to know what command would:

  1. select all URL in a file (i.e. recognize all addresses beginning with http or www from the beginning to the end and separate them from text or other data)

  2. output them in a .txt file.

The idea is to perform next a wget -i on the .txt file. I need to properly select and output these URL in a .txt file as wget struggles to directly identify all URL in the raw file.

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
ivako
  • 31
  • 1
  • 5
  • See http://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file – Zwans Jan 06 '17 at 09:48

1 Answers1

1

I followed instructions in How to use grep and cut in script to obtain website URLs from an HTML file and it worked perfectly in my case, as URL are between < href > in the input file:

grep -Po '(?<=href=")[^"]*(?=")' INPUT_FILE > OUTPUT_FILE.txt
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
ivako
  • 31
  • 1
  • 5