6

I downloaded a website using:

wget -c --mirror -p http://www.somewebsite.com

for offline viewing and I just remembered that I forgot the --convert-links option! They are all on my hard drive right now. Is there a way to do --convert-links without redownloading the whole website?

Coding District
  • 333
  • 1
  • 7

2 Answers2

10

Straightforward one: serve local directory with something like SimpleHTTPServer, then re-wget from localhost with appropriate options.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
alex
  • 7,093
  • 6
  • 28
  • 30
  • 1
    Quite a good idea, why the down vote? – phunehehe Feb 14 '11 at 09:32
  • 3
    I should add that one shout edit `/etc/hosts` to fake localhost as the website being mirrored. And this will only work well if the links all point to the same webpage (i.e no hot-linked images). – phunehehe Feb 14 '11 at 09:34
  • What is SimpleHTTPServer? Is it the Python module? If so, can you show how to use it (so I can revoke my careless downvote). – tshepang Feb 14 '11 at 13:01
  • 1
    @Tshepang Yes Python is pretty awesome with the huge load of modules :) SimpleHTTPServer is [my favorite](http://phunehehe.isgreat.org/2011/file-sharing-with-pythons-built-in-http-server/) – phunehehe Feb 14 '11 at 15:10
  • @phunehehe That's very clever. If the local files link to remote files, I don't think the answer would work without your idea. As long as you have downloaded the remote files locally, your idea will work as expected. – Daniel Kaplan Mar 24 '22 at 22:52
3

Also, don't forget to use the option --timestamping, or add timestamping=on to "~/.wgetrc". It ensures that when you re-mirror the website, you don't re-download the whole website, but only changed/new files. See the section Time-Stamping in manpage of wget for more.

FWIW I use this to mirror my blog:

wget --mirror --adjust-extension --convert-links --no-cookies --timestamping http://example.com --output-file=log-blog

tshepang
  • 64,472
  • 86
  • 223
  • 290