Friday, December 21, 2012

EC2: move a large file between Windows instances

Moving a file between linux machines is easy-peasy, just use scp. (You can be sure ssh/scp is on all your linux ec2 machines.) Windows? Sigh, Windows. To get ssh/scp on Windows you need to install cygwin, and that is a non-trivial step to take.

So, how to move a 60GB file to move from one Windows ec2 instance to another Windows ec2 instance (in a different region)? Here is what I did:
  1. Install CloudBerry Pro on the source server. (Must be the pro version: you can get a 14-day free trial; when that expires it is a $30 cost)[1]
  2. Install CloudBerry free version on the target server.
  3. Test copying over a small file, via your S3 account, to make sure it works. I'm assuming you already know how to use this type of two-pane file-copy application. (I created the bucket in the same region as the target machine: that means the upload takes longer than the download.)
  4. In CloudBerry Pro, Tools menu, then Options, then choose Compression And Encryption tab. Check "Use Compression".[2]
  5. Copy the big file. It gives no progress.
  6. When it had finished it said it was 21% done. Very confusing. And on the server it just showed as 13GB file, not a 60GB one.[3]
  7. Download to your target server, using CloudBerry free version. (yes it works fine to download large files, to download compressed files.)
  8. Rename your downloaded file with a ".gz" extension, as that is what it actually is.[3]
  9. Install 7-zip, if you don't have a program that can deal with gzip files. It tells me the file is 2GB compressed, 13GB uncompressed. Ignore that, it is just being stupid. Decompress it, and you get a 60GB file.
Phew,  hard work. If you needed to do it regularly you should install cygwin and use scp! <soap-box>Or port your applications to linux where the living is easy. Apart from the fact that running Windows machine is harder, it is also significantly cheaper to run the cloud instances.</soap-box>

[1]: I've heard, but not confirmed, that you can uninstall it from one machine, then use the same install key on a different machine. If true, that is quite a fair license, and I encourage you to support them.

[2]: As we saw, this creates more work, so uncheck after doing your big file.

[3]: I think CloudBerry Pro should have put a .gz extension on the file, when it uploaded it, to make it clear what was going on.

Saturday, December 15, 2012

g++ linker: behaviour change

I was working late on a Friday night, setting up an Ubuntu 12.04 machine. Final step was compiling a couple of C++ programs, that worked fine on Ubuntu 10.04, 11.04 and Centos 5. I got linker complaints regarding boost::program_options. Sounded like it wasn't installed. Strange, as I'd installed "libboost-all-dev".

Poke around... all the files seem to be there. The lib files are in /usr/lib/. Surely g++ is already looking there, but I added "-L/usr/lib/" to be sure. No help. I'd tried all the various ideas I found on StackOverflow, until eventually I found http://stackoverflow.com/a/11250976/841830 where "panickal" suggested the list of libraries has to come last. Yeah sure. But getting desperate by this point I give it a try... and it works!

Specifically, in my Makefile, I had to change this line:
    $(TARGET): $(OBJS)
        $(CXX) -s $(LDFLAGS) $(OBJS) -o $@


To look like this:
    $(TARGET): $(OBJS)
        $(CXX) -s $(OBJS) -o $@
$(LDFLAGS)

So somewhere between g++ 4.5.2 and g++ 4.6.3 it has gone from being easy-going and taking parameters in any order, to requiring certain ones to come at the end. A strange evolution. (I understand why the ordering amongst the library files matters, I just don't see why they now have to all go together at the end.)

But luckily I was using a Makefile with some structure, so the fix was trivial, and my Friday night did not become a Saturday morning!