Amazon doesn't have a tool for this, and their APIs seem to support transferring only one file at the time. I was looking for a solution that wouldn't tie up my computer for a long duration while doing the transfer piecemeal. So I thought why not make a tarball with my files and transfer it in one fell swoop via ftp on a Linux box running as an instance of Amazon elastic computing cloud EC2. Then the plan was to run some scripts to unpack the thing and transfer the files fast between EC2 and S3 servers, through Amazon's internal network.
Assuming you already have a subscription to Amazon's services, and know your way around EC2, here are the steps to take (you may need to change the things in pink to match your needs) :
1. look for a suitable AMI to base your instance on:
ec2-describe-images -o self -o amazon | grep getting-started
2. pick and run an instance
ec2-run-instances ami-3c47a355 -k gsg-keypair
2.5 wait until your instance is running, check on its status with
ec2-describe-instances
3. log in to the instance
ssh -i id_rsa-gsg-keypair root@ec2-184-73-124-245.compute-1.amazonaws.com
4. on the instance, look for and install ftp
yum list vsftpd
yum install vsftpd.i386
5. configure the ftp server (uncomment "#anon_upload_enable=YES") in
/etc/vsftpd/vsftpd.conf
6. start the ftp server
/etc/init.d/vsftpd start
7. prepare the destination for anonymous ftp (with total disregard for the security implications)
mkdir /var/ftp/pub/upload
chmod 777 /var/ftp/pub/upload
8. from your computer ftp the tarball to the server (ftp anonymous@ec2-184-73-124-245.compute-1.amazonaws.com/pub/upload)
9. after the ftp transfer completes, go back to the instance and unpack the tarball
10. install an awesome script from http://timkay.com/aws/
cd
curl timkay.com/aws/aws -o aws
11. configure the tool with your Amazon credentials, by creating/editing an .awssecret file and placing your Access Key ID on the first line, followed by your Secret Access Key on the second line
12. save the .awssecret file and set its permissions
chmod 600 .awssecret
13. install the tool
perl aws --install
14. give it a try, make a destination bucket to S3
s3mkdir tartordesign
15. go where your files are
cd /var/ftp/pub/upload
16. prepare an awk script to do the upload and, optionally, set the visibility of your files to public (I needed mine to be available on the web), edit a vi do.awk file with the following content:
{
printf "putting %s\n", $1
cmd = "aws put tartordesign/ " $1
system(cmd)
cmd = "aws put tartordesign/" $1 "?acl --set-acl=public-read"
system(cmd)
}
17. run the script that does the job
ls | grep .jpg | awk -f do.awk -
note: sometime the piping construct above may fail because "ls | grep .jpg |" produces truncated names. A more robust approach is this:
ls | grep .jpg > list
awk -f do.awk list
18. go about your business until it's finished (you may choose to send the output of that script to a file and later check on the progress by tailing on that file, that way you may end your ssh connection established at step 3, so you're not tied up to anything, instead you reconnect only periodically to check the progress)
19. once the script is done terminate your running EC2 instance, the one you started at step 2
ec2-terminate-instances i-2b57bf41
That's it folks.
2 comments:
Hi, we are facing the same problem now, but our amount of files is ~2M. Can you tell, how fast that upload was, do you have any time calculations? BTW, thanks for very useful information!
@zelishe, unfortunately I haven't done any measurements regarding how fast the method is. For me it was a one-time experience, as I didn't have to run the script again, to give me the chance to collect some stats. In any case, it was something reasonable, I believe I left the EC2 instance running over night and in the morning it was all done.
Post a Comment