2

How to download hundereds of .pdf files from http://www.ncbi.nlm.nih.gov/pmc/articles using a loop, for example for the following document ids:

PMC3386155
PMC3625956
PMC3477654
PMC3531051
PMC3114846
PMC3117879
PMC3130560
PMC3531173
PMC3546115
PMC3354575
PMC3771521
Rahul Patil
  • 24,281
  • 25
  • 80
  • 96
sami
  • 41
  • 2
  • 3
  • Are these open access or do you have to enter your institution's credentials every time? – terdon Sep 21 '13 at 18:17
  • 1
    related: http://unix.stackexchange.com/questions/83687/using-wget-to-get-file-names-from-a-text-file/83698#83698 – slm Sep 21 '13 at 18:22
  • pubmed Q: http://unix.stackexchange.com/questions/91696/how-to-download-pdf-files-from-pubmed. Why is this one different? – slm Sep 21 '13 at 18:39
  • @sami have you checked that script ? is there any issue ? – Rahul Patil Sep 22 '13 at 08:55

1 Answers1

4

Here is Working Tested Script

Using wget

#!/usr/bin/env bash

Link="http://www.ncbi.nlm.nih.gov/pmc/articles/"

ID=(    PMC3386155 PMC3625956 PMC3477654 PMC3531051
        PMC3114846 PMC3117879 PMC3130560 PMC3531173
        PMC3546115 PMC3354575 PMC3771521 )

for f in ${ID[@]};
do
   wget  --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" \
         -l1 --no-parent -A.pdf ${Link}${f}/pdf/ -O ${f}.pdf
done

Since Remote site does not allow user agent like wget and curl that's why we have to explicitly specify user agent in wget

Using Curl

ID=( PMC3386155 PMC3625956 PMC3477654 PMC3531051 PMC3114846 PMC3117879 PMC3130560 PMC3531173 PMC3546115 PMC3354575 PMC3771521 )

Link="http://www.ncbi.nlm.nih.gov/pmc/articles/"

Args='-O -J -L -A "Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"'

printf "%s\n" ${ID[@]}  | xargs -n1 -I{} echo curl $Args ${Link}'{}'/pdf/ | sh

Some explanation

  • -O Output File
  • -J Output File name from remote-header-name ( curl 7.21.2 or newer )
  • -L Remote site redirected to other download page to follow that use this
  • -A User agent
Rahul Patil
  • 24,281
  • 25
  • 80
  • 96