3

The useful option wget --convert-links or wget -k makes "links in downloaded HTML or CSS point to local files." It makes two passes:

  • Pass 1: download files.
  • Pass 2: convert links.

I want to do pass 1 now and pass 2 later. I want to invoke the two passes separately. I want wget to stop after pass 1, let me do some stuff, and only then continue with pass 2. I just want to convert links as a separate command, whether the command is wget or something else. How, please?

And if wget won't do this, then is there a Perl module, Python module or the like that will?

(For reference: this answer partly answers my question. This question is similar, but its answer seems to fail. At any rate, neither gives something that actually works as far as I can tell.)

thb
  • 1,125
  • 12
  • 21
  • I gather that the current version of `wget` lacks the feature I need. Therefore, I might [try to add the feature](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=847216) to the next version of `wget`. – thb Dec 07 '16 at 12:35
  • It look like a doable job for htttrack: https://www.httrack.com/html/fcguide.html – NVRM May 14 '17 at 16:22

1 Answers1

1

It seems like this question was actually answered in this other question, but it is outdated, so here is the current solution:

It involves passing your local directory after your 'Pass 1' to wget making it believe it's a website. This is easily done with some short Python code invoking SimpleHTTPRequestHandler, and then you re-wget from localhost with the appropriate options.

Something along these lines can work:

import http.server
import socketserver
import os

PORT = 8000 # optional, can be changed

web_dir = os.path.join(os.path.dirname(path), 'web') # change path to your local files here if needed, like '/home' or 'C:\\tmp' on Windows
os.chdir(web_dir)

Handler = http.server.SimpleHTTPRequestHandler
httpd = socketserver.TCPServer(("", PORT), Handler)
print("serving at port", PORT)
httpd.serve_forever()

then your wget can be called on http://localhost:8000

viiv
  • 26
  • 2
  • I added `` and the extra line after that is required to have Python formatting, but the formatting doesn't show up. If somebody can edit/fix that'd be great :) (I guess adding the Python tag to the question would do the job, but not sure it's the best approach here) – viiv Jul 23 '18 at 12:01