Showing posts with label cloud computing. Show all posts
Showing posts with label cloud computing. Show all posts

Thursday, October 3, 2013

Backing-up a bunch of small files to a remote server

I have a directory, containing lots of files, and I want an off-site, secure backup.

Even though the remote server might be a dedicated server that only I know root password for, I still don't trust it. Because of the recent NSA revelations I no longer consider myself paranoid. Thanks guys, I can look at myself in the mirror again!

As a final restriction, I don't want to have to make any temp files locally: disk space is tight, and the files can get very big.


Here we go:

cd BASE_DIR
tar cvf - MY_FOLDER/ | gpg -c --passphrase XXX | ssh REMOTE_SERVER 'cat > ~/MYFOLDER.tar.gpg'


(the bits in capitals are the things you replace.)

Notes
  • The "v" in "tar cvf" means verbose. Once you are happy it is working you will want to use "tar cf" instead.
  • The passphrase has to be given in the commandline because stdin is being used for the data!! A better way is to put the passphrase in another file: --passphrase-file passfile.txt. However note that this is only "better" on multi-user machines; on a single-user machine there is no real difference.
  • I'm using symmetric encryption. You could encrypt with your key pair, in which case the middle bit will change to: gpg -e -r PERSON  Then you won't need to specify the passphrase.
  • In my case REMOTE_SERVER is an alias to an entry in ~/.ssh/config. If you are not using that approach, you'll need to specify username, port number, identity file, etc. By the way, I'm not sure this method will work with password login, only keypair login, because stdin is being used for the data.
  • Any previous MYFOLDER.tar.gpg gets replaced on the remote server. So, if the connection gets lost halfway during the upload then you've lost your previous backup. I suggest using a datestamp in the filename, or something like that.
What about to get the data back?

cd TMP_DIR
ssh REMOTE_SERVER 'cat ~/MYFOLDER.tar.gpg' | gpg -d --passphrase XXX | tar xf -


You should now have a directory called MYFOLDER, with all your files exactly as they were.


Outstanding questions

Is it possible to use this approach in conjunction with Amazon S3, Google Drive, Rackspace cloud files, or similar storage providers? E.g. 100GB mounted as a Rackspace drive is $15/month (plus the compute instance of course, but I already have that), whereas 100GB as cloud files is $10/month, or $5/month on google drive. ($9.50/month on S3, or $1/month for glacier storage). Up to 15x cheaper: that is quite an incentive.

2013-10-08 Update: The implicit first half of that question is: is there a way to stream stdout to the remote drive (whether using scp or a specific commandline tool).
For Amazon S3 the answer is a clear "no": http://stackoverflow.com/q/11747703/841830 (the size has to be known in advance).
For Google Drive the answer is maybe. There is a way to mount google drive with FUSE: https://github.com/jcline/fuse-google-drive   It looks very complicated, describes itself as alpha, and the URL for the tutorial is a 404.
For Rackspace CloudFiles (and this should cover all OpenCloud providers), you can use curl to stream data! See "4.3.2.3. Chunked Transfer Encoding" in the cloud files developer guide HOWEVER, note that there is a 5GB limit on a file. That is a show-stopper for me. (Though by adding a custom script instead of "ssh REMOTE_SERVER 'cat > ~/MYFOLDER.tar.gpg'", I could track bytes transferred and start a new connection and file name at the 5GB point, so there is still hope. Probably only 10 lines of PHP will do it. But if I'm going to do that, I could just as easily buffer say 512MB in memory at a time, and use S3)

NOTE: Because I've not found an ideal solution yet, I never even got to the implicit second part of the question, which is if the need to "cat" on the remote server side will cause problems. I think not, but need to try it to be sure.


Friday, December 21, 2012

EC2: move a large file between Windows instances

Moving a file between linux machines is easy-peasy, just use scp. (You can be sure ssh/scp is on all your linux ec2 machines.) Windows? Sigh, Windows. To get ssh/scp on Windows you need to install cygwin, and that is a non-trivial step to take.

So, how to move a 60GB file to move from one Windows ec2 instance to another Windows ec2 instance (in a different region)? Here is what I did:
  1. Install CloudBerry Pro on the source server. (Must be the pro version: you can get a 14-day free trial; when that expires it is a $30 cost)[1]
  2. Install CloudBerry free version on the target server.
  3. Test copying over a small file, via your S3 account, to make sure it works. I'm assuming you already know how to use this type of two-pane file-copy application. (I created the bucket in the same region as the target machine: that means the upload takes longer than the download.)
  4. In CloudBerry Pro, Tools menu, then Options, then choose Compression And Encryption tab. Check "Use Compression".[2]
  5. Copy the big file. It gives no progress.
  6. When it had finished it said it was 21% done. Very confusing. And on the server it just showed as 13GB file, not a 60GB one.[3]
  7. Download to your target server, using CloudBerry free version. (yes it works fine to download large files, to download compressed files.)
  8. Rename your downloaded file with a ".gz" extension, as that is what it actually is.[3]
  9. Install 7-zip, if you don't have a program that can deal with gzip files. It tells me the file is 2GB compressed, 13GB uncompressed. Ignore that, it is just being stupid. Decompress it, and you get a 60GB file.
Phew,  hard work. If you needed to do it regularly you should install cygwin and use scp! <soap-box>Or port your applications to linux where the living is easy. Apart from the fact that running Windows machine is harder, it is also significantly cheaper to run the cloud instances.</soap-box>

[1]: I've heard, but not confirmed, that you can uninstall it from one machine, then use the same install key on a different machine. If true, that is quite a fair license, and I encourage you to support them.

[2]: As we saw, this creates more work, so uncheck after doing your big file.

[3]: I think CloudBerry Pro should have put a .gz extension on the file, when it uploaded it, to make it clear what was going on.

Tuesday, November 20, 2012

The cloud and Wall Street

An enjoyable video on how Wall Street uses the cloud, and why generally they don't: http://www.infoq.com/presentations/Cloud-Wall-Street

If you only have one minute, here is my summary:
Yes, banks could not just save money but also make their development more nimble and perhaps even more reliable, by moving to the cloud. But because they have loads of legacy systems that integrate in complex ways, there is the element of "if it ain't broke then don't try to fix it." An even bigger reason is moving would be a major project that would distract energy at all levels of the company from their real business of making money. They'd rather make money from increasing sales than make money from reducing costs.
If you have 62 minutes, and have an interest in the intersection of IT and finance, the whole thing is worth your time.
If less time, and are interested in why they should be using the cloud more, that is 35:30 to 39:00.
If you want to understand why they don't, 32:00 to 35:30, and questions from 39:00-44:00. Then 48:00 to 52:00.
If you are interested in Apache Ambari, and creating your own clouds, which is that is 52:00 to 60:00; it is only loosely related to the main theme of the talk.
Question at 60:00 on application rot is interesting.

Note: he generally uses cloud in the sense of virtualization on heavy-duty hardware, that you own and install. (As opposed to the sense of compute units running out there somewhere, that you pay for by the hour, and that you can start and stop just when you need them.)




Thursday, September 13, 2012

Microsoft Azure cloud hosting: vapourware??

I blogged before about Microsoft being a surprise new player in the linux IaaS cloud arena. Well, yesterday, I had a burning need for a new virtual server, and what's more I needed a Window server. So, I decided to take this chance to evaluate Azure. Went through the prices again, watched a couple of setup videos: it all looks competitive, and looks like it might be easier to manage Windows cloud instances than on Amazon EC2.

I signed up (created a LiveID). Then had to give my address and credit card details, to apply for their 90 day free trial. No problems there, though one minor gripe: Japanese postcodes are three digits, then four more digits. It refused "123", and then it refused "1230001". You have to put it in as "123-0001".

Then it tells me "Setting up your Windows Azure subscription". It takes forever, then after two very long minutes it comes back and says: "Sorry! We could not activate this feature. Please contact support." Gulp, I just gave these cowboys my credit card.


However going to my account page shows the 90 day free trail is activated. OK, we're rolling... no, we're not, I click add subscription, end up in that same long "Setting up your Windows Azure subscription" screen. But this time it does something different after 30 seconds, and it looks my account and trial are activated. (Incidentally I ended up with two emails, both telling me my credit card has been charged for $0.00)

It takes quite a bit of clicking around to find the screen where you create new instances - no link from inside the Account page, as far as I can see. Anyway, once there I get told I cannot create a virtual server without signing up for the "preview program". First mention of that!! So, I sign-up for it and I get told:
    We are sorry, but we could not complete that operation.


I then click the "portal" button in top right and end up at a page that tells me I've been accepted to the "preview"?!

That then takes me to a page where I still cannot start a virtual server. I get told I need to sign up for the preview program. Umm, didn't we just do this?

Logged off, on again.

This time, at the top I see a green "preview" button. It tells me the interface I'm trying to use is a crippled new version, the old version is the one that actually works. (I'm paraphrasing.) I click that and get told to install silverlight. Silverlight?!!

FAIL.

Off to Amazon EC2, and my Windows instance was up in 20 minutes (which is still far too long, I can get a Linux instance running to the same level of usefulness in 2-3 minutes, but that is a rant for another time... at least Amazon are not wasting my time telling me they have a service that in fact they don't have.)




Monday, September 10, 2012

They are SPOFs hiding everywhere!

One of my test systems sent me a few hundred emails between 2:38 and 7:45am JST. Just a test server so it didn't go to my phone and I noticed the problem around 7:30am. No paying clients were affected.

I dived straight in and found my Rackspace UK box couldn't find api.qqtrend.com. But DNS lookup worked for me. Other DNS was  working on the Rackspace box. I also found it couldn't ping the DNS server (at GoDaddy). But I could from my office LAN. And I also could from a U.S. server.

Conclusion (totally wrong - see below): Rackspace UK data centre had issues.

So I logged in to Rackspace, to check for alerts, and post a support ticket. There was a mention, phrased very vaguely, saying "someone else" has DNS problems. Hhmmm. By this time I could ping the GoDaddy server, but DNS lookup still failed. Uncertain I pinged around a bit more, and by that time DNS had started working.

I.e. The underlying problem had already been fixed, it was just taking time to spread through the internet, and I could have "fixed it" with no effort by just staying in bed 15 minutes longer. Oh well.

Anyway, it turns out it was a sociopath attacking GoDaddy: http://www.bbc.co.uk/news/business-19549367


Here are his reasons:
    "i'm taking godaddy down bacause well i'd like to test how the cyber security is safe and for more reasons that i can not talk now." (sic)

OK, English may not be his first language. But even allowing for that, he is not coming across as an upstanding member of society. Not protesting, no cause, just wanting to see if he had the skills to annoy a lot of people. This is a guy (gal?) who badly needs a girlfriend/boyfriend. If you know him, introduce him to someone. Please.

I know GoDaddy had a big PR screw-up by initially supporting SOPA, but they had the courage and sense to listen to people and change their position. Still a good company in my mind.

But, the silver lining is it nicely illustrated we (QQ Trend) have a Single Point Of Failure, at the DNS and registrar level, that had been overlooked. We'd previously got server1 and server2 as the two endpoints. We have them in different continents, and different cloud providers (no secret: Amazon and Rackspace). And I thought that was solidity to boast about. I was about to add server3 in a third continent as an option for customers who really, really need 100% uptime.

But what today's problem reveals is that if all three servers are on the same domain: server1.example.com, server2.example.com, server3.example.com, then we have a potential issue with DNS, and even with the registrar.

I think we need an alternative domain, at a completely different registrar, with DNS at an independent ISP. Then at the script level we add that in as one of the failover endpoints. They'll point to same three servers.

For instance, however big Amazon or GoDaddy (or any infrastructure provider) get, however many data centres they have around the world, they are still open to attacks, politics and human error inside their organization. We're service providers building on top of their infrastructure. It is our job to accept their limitations and do something about it.



Friday, May 18, 2012

EC2 and Windows: a match made in... Hell

Henry Ford is famous for his lack of flexibility over the Model T: You can it in any colour you want, as long as it is black.

I think Amazon have taken a leaf out if his book, when offering their Windows instances: You can have any size disk you want, as long as it is 30Gb.

Don't believe me? Go on, try automating the creation of a 100Gb boot disk Windows machine. Or creating one from the web interface, without any post-configuration steps in Windows itself.

(I'll save you the Google: here is the AWS engineers telling you the multiple steps needed to achieve that. The instructions are different depending on the exact version of Windows.)

In fact, try automating anything to do with Windows configuration, using EC2. You can't. It can't be, won't be, scripted. You always have to log on afterwards (going through the EC2 Console to get your almost-impossible-to-type password) to do something. Usually quite a tedious and time-consuming something.
The bottom-line: Windows is not designed for the cloud.



...and yet, some of my clients, and some of my potential clients, insist on trying to use Windows anyway. The cloud is where they feel they should be, so people want to move their legacy apps there. Whenever I ask them how they do it, or why they do it, they seem to find justification. It is the cloud, look we're scaling. We're faster. It works! Like a man let out of prison, and running free in the meadow... only he still has the manacles and chains from his time doing time. Am I carrying my metaphor too far by wishing people would stop and take the Linux axe to the manacles before rushing off to the meadow?

Are you an expert at automating Windows on EC2? Please post a comment showing off your knowledge. I'm willing to learn, and will edit this article if you convince me it can be done :-)

Tuesday, April 10, 2012

Cloud options compared: EC2, Rackspace, HP

I've been participating in HP Cloud's closed beta program, and they have now announced their billing They use the same cloud system and API as Rackspace, making it a fair comparison.  (See Rackspace cloud server pricing and Rackspace cloud file pricing)

Executive summary: HP have priced themselves slightly lower, but don't have the two smallest server configurations. So, Rackspace is still better than HP or Amazon EC2 if you just need something minimal, as I described here.
(2012-08-15 UPDATE: Rackspace are stopping their minimal server config: no more 256MB option in their "nextgen" V2 API. Also scheduled images are not available in the V2 API (yet).)

HP offer slightly less disk space than Rackspace, for the same memory size. HP's bandwidth is slightly cheaper than Rackspace; incoming bandwidth is free on both.

HP's CDN offering is weird; based on your billing location, rather than where your servers are, or where your customers are! If you are a U.S. or European company it is cheaper. If not it is more expensive. However, for Japan, HK, Singapore it is only a fraction more expensive, so not a showstopper. Also it is tiered: if you're spending more than $200/month on CDN bandwidth it will work out cheaper still.
If your headquarters are not in North America, Europe, Latin America, Japan, Hong Kong or Singapore then HP pricing says they'd rather you took your CDN business elsewhere.

( 2012-06-12 UPDATE:  I've added a surprise Linux cloud hosting option to the below: Microsoft Azure! They are competitive at the low end: 1GHz CPU, 768MB RAM, 25GB storage, 4GB outbound bandwidth (inbound is free), is $12.51/month (about 1.5c/hr). An extra 20% discount for paying for 6 months. But also reasonable at the high-end too (except 14GB is the most memory they can offer). (price calculator) )


Comparing HP's cheapest option to the closest (based on memory size) alternative:
HP: 1Gb RAM, 30GB disk, 1 virtual core: $0.04/hr ($29/month)
Rackspace: 1Gb RAM, 40GB disk, 1/16th of a 4-CPU server: $0.06/hr ($43.80/month)
EC2 Small: 1.7GB RAM, 160GB disk, 1 virtual core, $0.08/hr ($58/month)
Azure Small: 1.6GHz CPU, 1.75GB RAM, 25GB disk, $0.08/hr ($60/month)
   (NB. the XSmall, with 768MB RAM, 1Ghz CPU is $12.50/month)

At the top of the CPU range:
HP: 32GB RAM, 960GB disk, 8 virtual cores: $1.28/hr  ($934/month)
Rackspace: 31GB RAM, 1200GB disk,  4-CPUs: $1.80/hr ($1,314/month)
EC2, High-CPU Extra Large:7GB RAM, 1690GB disk, 8 virtual cores (20 compute units): $0.66/hr ($481/month)
EC2, High-Memory Quadruple Extra Large: 68GB RAM, 1690GB disk, 8 virtual cores (26 compute units): $1.80/hr ($1,314/month)
Azure, XLarge: 8 x 1.6GHz CPU, 14GB RAM, 975GB disk: $0.64/hr ($575/month)

(So, if you are CPU bound then Amazon is best, if memory bound then HP is best.)

Overall, the pricing seems reasonable. However Rackspace have a brand oozing stability and reliability, and Amazon are huge and have data centres all over the world, so I'm not sure HP prices are low enough to worry their competitors. The 50% discount during the beta program makes them a good buy, short-term, though!

Sunday, March 18, 2012

Changing Rackspace Cloud Instance Sizes

Following on from my post about the actual costs of using Rackspace, this article is about the "Actual downtime of changing the size of a Rackspace cloud server".

I wanted to go from a 10Gb disk to 20Gb disk. I got confused by the interface and accidentally went from 10Gb to 40Gb. So I then also got to try downsizing from 40Gb to 20Gb!

Q1. Does my IP address change? Do I need a new SSH key?
A. No and No.

Q2. How long does it take?
A. The 40Gb to 20Gb downsize took 13 minutes in total. The 10Gb to 40Gb upsize was about the same or slightly quicker. If you like to be careful and take a disk image beforehand, that takes about another 10-15 minutes (for about 9Gb, so allow more if your disk is bigger).

Q3. How much downtime?
A. At least 30 seconds... Prior to the move I saw activity in one program at 06:28:27; the new server was fully running at 06:29:05-ish, and the first new activity in that program was at 06:30:03.

The reason for the almost 60 second delay in that particular program was it gets started from a monitoring script (that I wrote) that runs on a 1-minute cronjob. The cron job didn't run at 06:29:00, so had to wait until 06:30:00. If I had started my program from the init.d scripts, or even from rc.local, the downtime would only have been 30-40 seconds. That is the same downtime you can expect for a webserver.

Q4. Does it depend on the size of the server?
A. I believe so. Bigger servers would have more files to copy.

Q5. Do in-memory caches survive? Do background processes keep running?
A. No and no. The resize implies a reboot. Any background process you'd started manually before needs restarting manually again when the new server comes up. Use /etc/rc.local or cronjobs to auto-start things.

Q6. Do I need to be around when I resize?
A. Yes to get it started. You could then go away (you are supposed to verify it when it comes up, but I think if you don't verify it then after a few hours it assumes you are happy that everything is running smoothly.)  (The point of the verify step is they keep the old image around and can quickly revert - I did not test this to see how quick the revert would be.)

Q7. How do I change server size without any downtime at all?
A. Big question. The glib answer is: If you have to ask, then it is too difficult. The slightly more helpful answer is: if a web server then run two servers, with a load balancer in front of them; if a database server, look into database clusters.

Saturday, November 12, 2011

Actual costs: rackspace cloud

A few months back I decided to put a 24/7 script on a Rackspace Cloud instance, instead of the more obvious Amazon EC2 choice. The reason at the time was my needs were low CPU but relatively high bandwidth and diskspace usage and it worked out cheaper.

Now I've had a few invoices in I am relieved to say there was no catch. My past three invoices have been $11.99, $11.99 and $12.20 (USD). This is for a minimal CPU spec (256MB, 1.6% of a quad core CPU, 10GB disk), 1.1 to 1.3 GB/month of outgoing bandwidth each month (there is no charge for incoming bandwidth), and cloud storage rising from 4 to 8GB. 90% of the monthly cost is for the machine, and the cloud storage has risen from $0.63 to $1.15. The bandwidth is not costing much at all.

In contrast on Amazon EC2, the micro instance would cost $15.65 (including $1 for 10GB of EBS storage), while a small instance would cost $62.25/month, of which $0.03 is the bandwidth usage. (The first year of that micro instance would be free if you are a new customer, but I am not.)

So, at the CPU bottom-end, Rackspace is winning on cost. The other feature of Rackspace Cloud that I love is there is an automatic daily backup of the full disk image, and that backup is stored in the cloud storage. (Storing that backup is basically all my $1/month cloud storage costs.)

What do I not like? I keep using up my 10GB disk space. But there seems no way to move to 20GB without doubling the CPU spec and doubling the monthly cost; with Amazon micro I'd just increase the EBS storage space. With an Amazon small instance I'd get 160GB and would not care.

What do I not like about Rackspace and Amazon? It is that you just get a basic linux distro. You have to spend time installing, configuring and maintaining. And the configuration is not trivial; I've kept a log of all I've had to do, and it includes things like moving ssh off of port 22, setting up an iptables firewall, installing a mail server (not a POP server, just enough so I can *send* email alerts), and writing my own low-diskspace email alert script. The latter was done just the other day after my application broke, yet again, because the machine had run out of disk space.

P.S. As I want 24/7, and have not mentioned the need to scale, what about cheaper shared hosting? Well, I couldn't find a VPS, that gives me root access and no restrictions, for under $10/month. It seems Rackspace is winning that fight too?

(2012-08-15 UPDATE: Rackspace are stopping their minimal server config: no more 256MB option in their "nextgen" V2 API. Also scheduled images are not available in the V2 API (yet). In other words the two things I pointed out that were good, in the above article, are going! I guess the Rackspace marketing department will have to work out a positive spin on "cutting out our competitive advantage so we look just like the competition now" ;-)




Friday, January 28, 2011

Using Amazon EC2 for Shodan Go Bet

I've put up a short article on how Amazon EC2 was used for the Shodan Go Bet event held at the end of December 2010:
http://dcook.org/gobet/using_amazon_ec2.html

It is not about computer go, but instead will be of interest if you have wanted to see a concrete example of how EC2 is used, just how fast the fastest choice is, exactly how much it costs, how the Amazon EC2 Windows instances work (I used remote desktop running on linux!), etc.

Wednesday, November 4, 2009

How much is that password worth?

These people have published details about how they used Amazon EC2 to crack passwords:
http://news.electricalchemy.net/2009/10/cracking-passwords-in-cloud.html

Personally, I skipped all the details and went straight to the interesting conclusions page:
http://news.electricalchemy.net/2009/10/password-cracking-in-cloud-part-5.html

It tells you how much it will cost someone (in EC2 charges) to crack your passwords, based on their lengths and the number of characters you use.

I used to think 8 characters was a good password. Seems it is worth about $3, or $45 if I've mixed in some numbers. Gulp. And all this is assuming there are no dictionary words in there. Double gulp.