11

This directory /data/files/ has thousands files like:

1test
2test
3test

[...]

60000test
60001test

I'm also sending them to a S3 Bucket (AWS), using AWS CLI. However, sometimes the S3 bucket can be offline and because of that the file is skipped.

How can I check if the file that exists in /data/files/ is also in the S3 Bucket? and if not copy the missing file to S3?

I would prefer to do this using BASH. Also if I need to change the AWS CLI for another one, can be.

Patrick B.
  • 119
  • 1
  • 1
  • 5
  • There are a bunch of command-line tools that talk to S3 such as `s3cmd` and `s4cmd` and FUSE filesystems such as s3fs and s3ql. There are also things like `rclone` which probably solve your entire problem for you. What are you currently using to talk to S3? – derobert Jan 23 '17 at 22:49
  • @derobert i'm using the `aws cli` - If you have an example to help please feel free to answer the question. – Patrick B. Jan 23 '17 at 23:12
  • I'd think `rclone copy /data/files whatever:` would do everything for you... But anyway, you should [edit] your question to clarify which software you're using to talk to AWS. And if you're open to switching. – derobert Jan 23 '17 at 23:13

5 Answers5

16

If you do aws s3 ls on the actual filename. If the filename exists, the exit code will be 0 and the filename will be displayed, otherwise, the exit code will not be 0:

aws s3 ls s3://bucket/filname
if [[ $? -ne 0 ]]; then
  echo "File does not exist"
fi
Mr Chow
  • 103
  • 3
onetwopunch
  • 261
  • 2
  • 4
  • 4
    The problem with this is that `s3 ls` will list the file and give a return code of 0 (success) even if you provide a partial path. For example, `aws s3 ls s3://bucket/filen` will list the file `s3://bucket/filename`. – Donnie Cameron Mar 02 '19 at 02:02
3

first answer is close but in cases where you use -e in shebang, the script will fail which you would most like not want. It is better to use wordcount. So you can use the below command:

wordcount=`aws s3 ls s3://${S3_BUCKET_NAME}/${folder}/|grep $${file}|wc -c`
echo wordcount=${wordcount}
if [[ "${wordcount}" -eq 0 ]]; then
do something
else
do something
fi
Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
3

Try the following :

aws s3api head-object --bucket ${S3_BUCKET} --key ${S3_KEY}

It retrieves the metadata of the object without retrieving the object itself. READ(s3:GetObject) access is required. .

Subrata Das
  • 171
  • 4
  • This is great, especially since it returns a JSON if you want to find a specific field, it's easy to grab the value. – Alexis Wilke Jun 09 '20 at 21:49
0

I was able to do it using rclone[1] as @derobert has suggested.

The command is very simple:

rclone check sourcepath remote:s3bucketname

Example:

Let's imagine you want to check if the S3 bucket (bucket name: tmp_data_test_bucket) has all the files that this directory has: /tmp/data/

Command:

rclone check /tmp/data/ remote:tmp_data_test_bucket

[1] http://rclone.org/

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
Patrick B.
  • 119
  • 1
  • 1
  • 5
0

I created two functions as an example because I figured I might want to know the size of the file and I may want to know if the file exists.

This function gets the size of the file and "returns" it as an echo:

s3_file_size() {
    if command -v aws &> /dev/null; then
        echo "$(aws s3 ls "${1}" --summarize | grep "Total.*Size" | grep -o -E '[0-9]+')"
        return 0
    else
        echo "Warn-${FUNCNAME[0]}, AWS command missing."
        return 1
    fi
}

This function uses the other one to determine if it receives a file size of 0 which will imply the file is essentially not there. (yes it will treat a file of size 0 as not there)

s3_does_file_exist() {
    if command -v aws &> /dev/null; then
        [[ $(s3_file_size "${1}") -lt 1 ]] && return 1 || return 0
    else
        echo "Warn-${FUNCNAME[0]}, AWS command missing."
        return 1
    fi
}
Mike Q
  • 149
  • 5