18

Is there an easy way to keep a folder synced with a directory listing via HTTP?

Edit:

Thanks for the tip with wget! I created a shell script and added it as a cron job:

remote_dirs=( "http://example.com/" "…") # Add your remote HTTP directories here
local_dirs=(  "~/examplecom" "…")

for (( i = 0 ; i < ${#local_dirs[@]} ; i++ )) do
cd "${local_dirs[$i]}"
wget -r -l1 --no-parent -A "*.pdf" -nd -nc ${remote_dirs[$i]}
done

# Explanation:
# -r            to download recursively
# -l1           to include only one directory depth
# --no-parent   to exclude parent directories
# -A "*.pdf"    to accept only .pdf files
# -nd           to prevent wget to create directories for everything
# -N            to make wget to download only new files

Edit 2: As mentioned below one could also use --mirror (-m), which is the shorthand for -r -N.

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
Lenar Hoyt
  • 714
  • 2
  • 7
  • 18

2 Answers2

22

wget is a great tool.

Use wget -m http://somesite.com/directory

-m
--mirror
    Turn on options suitable for mirroring.  This option turns on
    recursion and time-stamping, sets infinite recursion depth and
    keeps FTP directory listings.  It is currently equivalent to 
    -r -N -l inf --no-remove-listing.
George M
  • 13,589
  • 4
  • 43
  • 53
9

Like rsync, but use zsync to get from an httpd server.

gogators
  • 442
  • 1
  • 3
  • 12
  • There is not much documentation for zsync on the Internet. It would be really nice if you could elaborate on your answer. Thank you. – Behrooz Jul 07 '15 at 15:11
  • 3
    Behrooz - I actually use `lftp` and its `mirror` command now instead. – gogators Jul 08 '15 at 11:28