Deleting many files from an S3 bucket

By Norbert Preining

So we found ourselves in the need to delete a considerable amount of files (around 500000, amounting to 1.6T) from an S3 bucket. With the list of files in hand my first shot was calling

aws s3 rm s3://BUCKET/FILE

for each file. That wasn’t the best idea I have to say, since first of all, it makes 500000 requests, and then it takes a looong time. And this command does not allow to pass in multiple files.

Fortunately there is aws s3api delete-objects which takes a json input and can delete multiple files:

aws 3api delete-objects --bucket BUCKET --delete '{"Objects": [ { "Key" : "FILE1" }, { "Key" : "FILE2"} ... ]}}'

That did help, and with a bit of magic from bash (mapfile which can read in lines from stdin in batches) and jq, at the end it was a business of some 20min or so:

cat files-to-be-deleted | while mapfile -t -n 500 ary && ((${#ary[@]})); do objdef=$(printf '%s\n' "${ary[@]}" | jq -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}') aws s3api --no-cli-pager delete-objects --bucket BUCKET --delete "$objdef"
done

This reads 500 files a time, and reformats it using jq into the proper json format: reduce inputs is a jq filter that iterates over the input lines and does a map/reduce step. In this case we use an empty array as start and add new key/filename pairs on the go. Finally, the whole bunch is send to AWS with the above API call.

Puuuh, 500000 files and 1.6T less, in 20min.


Share on FacebookPin on PinterestFlattr the authorShare on RedditTweet about this on TwitterShare on LinkedInEmail this to someone