3

Environment: Linux VM which I have access only through command line console.

Goal: I need to download the file: https://download.nlm.nih.gov/umls/kss/2020AA/umls-2020AA-full.zip Unlike typical download using wget, this redirects to a sign-in page.

What I have tried till now:

Tried using the text based browsers: w3m, elinks. Until recent changes in the sign-in page, it used to work.

What had changed in the sign-in page?

Earlier the sign-in page used to take username and password. But recently they have introduced option to sign-in via Google, Microsoft etc.

Problem

The sign-in page looks like this in my local m/c:

enter image description here

But when browsing using text based browsers in the VM, it only shows a blank page.

Is there any solution to this issue?

N.B.

AdminBee
  • 21,637
  • 21
  • 47
  • 71

1 Answers1

1

UMLS files can be downloaded programmatically using the API tokens. The procedure is outlined here: https://documentation.uts.nlm.nih.gov/automating-downloads.html

  • Step 1: Get your API key from your UMLS profile. You can find the API key in the UTS ‘My Profile’ area after signing in.
  • Step 2: use the below script to download the required files:
#!/bin/bash

export apikey=$1
export DOWNLOAD_URL=$2

export CAS_LOGIN_URL=https://utslogin.nlm.nih.gov/cas/v1/api-key


if [ $# -eq 0 ]; then echo "Usage: download.sh apikey download_url"
                      echo "  e.g. download.sh e33c59db-1234-abcd-efgh-0117ab2cd5gh2  https://download.nlm.nih.gov/umls/kss/rxnorm/RxNorm_full_current.zip"
                      echo "       download.sh e33c59db-1234-abcd-efgh-0117ab2cd5gh2 https://download.nlm.nih.gov/umls/kss/rxnorm/RxNorm_weekly_current.zip"
   exit
fi


if [ -z "$apikey" ]; then echo " Please enter you api key "
   exit
fi

if [ -z "$DOWNLOAD_URL" ]; then echo " Please enter the download_url "
   exit
fi


TGT=$(curl -d "apikey="$apikey -H "Content-Type: application/x-www-form-urlencoded" -X POST https://utslogin.nlm.nih.gov/cas/v1/api-key)

TGTTICKET=$(echo $TGT | tr "=" "\n")

for TICKET in $TGTTICKET
do
    if [[ "$TICKET" == *"TGT"* ]]; then
      SUBSTRING=$(echo $TICKET| cut -d'/' -f 7)
      TGTVALUE=$(echo $SUBSTRING | sed 's/.$//')
    fi
done
echo $TGTVALUE
STTICKET=$(curl -d "service="$DOWNLOAD_URL -H "Content-Type: application/x-www-form-urlencoded" -X POST https://utslogin.nlm.nih.gov/cas/v1/tickets/$TGTVALUE)
echo $STTICKET

curl -c cookie.txt -b cookie.txt -L -O -J $DOWNLOAD_URL?ticket=$STTICKET
rm cookie.txt

Save the above script to a file named download.sh and can be invoked as below:

$ bash download.sh e33c59db-1234-abcd-efgh-0117ab2cd5gh2  https://download.nlm.nih.gov/umls/kss/2020AB/umls-2020AB-full.zip
  • Thanks for the updated script. Would be nice if you can mention the correct URL link in the last line. You have mixed up 2020AA with 2020AB :) – Kaushik Acharya Feb 08 '21 at 12:56