Thursday, September 13, 2012

Microsoft Azure cloud hosting: vapourware??

I blogged before about Microsoft being a surprise new player in the linux IaaS cloud arena. Well, yesterday, I had a burning need for a new virtual server, and what's more I needed a Window server. So, I decided to take this chance to evaluate Azure. Went through the prices again, watched a couple of setup videos: it all looks competitive, and looks like it might be easier to manage Windows cloud instances than on Amazon EC2.

I signed up (created a LiveID). Then had to give my address and credit card details, to apply for their 90 day free trial. No problems there, though one minor gripe: Japanese postcodes are three digits, then four more digits. It refused "123", and then it refused "1230001". You have to put it in as "123-0001".

Then it tells me "Setting up your Windows Azure subscription". It takes forever, then after two very long minutes it comes back and says: "Sorry! We could not activate this feature. Please contact support." Gulp, I just gave these cowboys my credit card.


However going to my account page shows the 90 day free trail is activated. OK, we're rolling... no, we're not, I click add subscription, end up in that same long "Setting up your Windows Azure subscription" screen. But this time it does something different after 30 seconds, and it looks my account and trial are activated. (Incidentally I ended up with two emails, both telling me my credit card has been charged for $0.00)

It takes quite a bit of clicking around to find the screen where you create new instances - no link from inside the Account page, as far as I can see. Anyway, once there I get told I cannot create a virtual server without signing up for the "preview program". First mention of that!! So, I sign-up for it and I get told:
    We are sorry, but we could not complete that operation.


I then click the "portal" button in top right and end up at a page that tells me I've been accepted to the "preview"?!

That then takes me to a page where I still cannot start a virtual server. I get told I need to sign up for the preview program. Umm, didn't we just do this?

Logged off, on again.

This time, at the top I see a green "preview" button. It tells me the interface I'm trying to use is a crippled new version, the old version is the one that actually works. (I'm paraphrasing.) I click that and get told to install silverlight. Silverlight?!!

FAIL.

Off to Amazon EC2, and my Windows instance was up in 20 minutes (which is still far too long, I can get a Linux instance running to the same level of usefulness in 2-3 minutes, but that is a rant for another time... at least Amazon are not wasting my time telling me they have a service that in fact they don't have.)




<- vs. = in R

One of the first thing to strike a programmer new to R is <- all over the place. "Ugly!" may be a first reaction. But probe a bit deeper and you'll find you can use = just as well. "So why bother with <- at all?!" is a common reaction.The most pragmatic advice I've seen from experienced R programmers is it really just comes down to personal preference.

As I'm coming from experience in C++, PHP, Javascript, python, java, (the list goes on), I use = in my R code. Except in one case, which I'll come to in a moment. I've used = in almost all my R-related blog posts and StackOverflow questions and answers and no-one has ever taken offence (AFAIK), shunned me (AFAIK), or done an edit to change them all to <-. So it appears to be acceptable.

Downsides


But there are three downsides I can think of, one important, one social and one bogus.

The first downside is R also uses the single equals sign for a named parameter assignment in a function call. This generally doesn't matter because using assignment in parameters is bad form in C-like languages, so I don't get confused. Just about the only time it ever matters to me is timing code. I *have* to write:

     timing = system.time( x <- do_calculation(a=1,b=2) )


If I write the following then x won't get assigned:

  timing = system.time( x = do_calculation(a=1,b=2) )

The second downside is the R community considers <- to be standard. So all packages use it, all books use it. If you want to be part of the in-crowd you have to use it too.

The third downside is it is easier to use search-and-replace to convert all your "<-" to "=", than it is to go the other way. But this is suspicious: the above example shows why you cannot do "<-" to "=" completely automatically anyway.

Upsides

Are there any upsides to preferring = to compensate? Yes, though they'll sound petty to people who believe using <- is the Only Way. First it is one less keystroke. Second, in comparisons, this does not work:

    if(x<-5)cat("x is less than minus five\n")

So you must put spaces around < and > in R; it is not just a style thing as it is in other languages.

The third reason is when I'm using <- it is communicating intent: I'm deliberately doing an assignment to a variable in the parameter list of a function call. As that is considered bad form in most languages, I like how it stands out.

I've saved the biggest upside for last: familiarity to programmers coming from any of the other C-like languages. (R is a C-like language too.)

Comparisons

You thought I'd mention ==, and how it can be confused with = ? That potential confusion exists in all C-like languages, and we just know to look out for it. And, anyway, it still exists in R:

   f(x=1)   vs.    f(x==1)

When I write one of those, did I mean the other?

Language Design

If I was designing R from scratch, how would I do it differently? I love how I can assign to named parameters in R - it is perhaps the most beautiful feature of the language. But it is an overload of the = sign, and one not found in other C-like languages, so I'd be tempted to change it to use ":" (it looks a bit like object notation in Javascript, but without the curly brackets). So the above example would become:

   timing = system.time( x = do_calculation(a:1,b:2) )

Notice how I can use "x=" safely, because it now has no other meaning. Also notice how this helps with the f(x=1)   vs.    f(x==1) confusion too. But I'm contradicting my comment above, about liking the way <- stands out. So maybe I want it to look like this:

   timing = system.time( x <- do_calculation(a:1,b:2) )

Now a lint tool can warn about use of a single equals sign in a function call, because there should never be one. Hhhmmm, I'm not convinced but it is something to chew on.

Do you have any thoughts or constructive criticism? Please add a comment.

Monday, September 10, 2012

They are SPOFs hiding everywhere!

One of my test systems sent me a few hundred emails between 2:38 and 7:45am JST. Just a test server so it didn't go to my phone and I noticed the problem around 7:30am. No paying clients were affected.

I dived straight in and found my Rackspace UK box couldn't find api.qqtrend.com. But DNS lookup worked for me. Other DNS was  working on the Rackspace box. I also found it couldn't ping the DNS server (at GoDaddy). But I could from my office LAN. And I also could from a U.S. server.

Conclusion (totally wrong - see below): Rackspace UK data centre had issues.

So I logged in to Rackspace, to check for alerts, and post a support ticket. There was a mention, phrased very vaguely, saying "someone else" has DNS problems. Hhmmm. By this time I could ping the GoDaddy server, but DNS lookup still failed. Uncertain I pinged around a bit more, and by that time DNS had started working.

I.e. The underlying problem had already been fixed, it was just taking time to spread through the internet, and I could have "fixed it" with no effort by just staying in bed 15 minutes longer. Oh well.

Anyway, it turns out it was a sociopath attacking GoDaddy: http://www.bbc.co.uk/news/business-19549367


Here are his reasons:
    "i'm taking godaddy down bacause well i'd like to test how the cyber security is safe and for more reasons that i can not talk now." (sic)

OK, English may not be his first language. But even allowing for that, he is not coming across as an upstanding member of society. Not protesting, no cause, just wanting to see if he had the skills to annoy a lot of people. This is a guy (gal?) who badly needs a girlfriend/boyfriend. If you know him, introduce him to someone. Please.

I know GoDaddy had a big PR screw-up by initially supporting SOPA, but they had the courage and sense to listen to people and change their position. Still a good company in my mind.

But, the silver lining is it nicely illustrated we (QQ Trend) have a Single Point Of Failure, at the DNS and registrar level, that had been overlooked. We'd previously got server1 and server2 as the two endpoints. We have them in different continents, and different cloud providers (no secret: Amazon and Rackspace). And I thought that was solidity to boast about. I was about to add server3 in a third continent as an option for customers who really, really need 100% uptime.

But what today's problem reveals is that if all three servers are on the same domain: server1.example.com, server2.example.com, server3.example.com, then we have a potential issue with DNS, and even with the registrar.

I think we need an alternative domain, at a completely different registrar, with DNS at an independent ISP. Then at the script level we add that in as one of the failover endpoints. They'll point to same three servers.

For instance, however big Amazon or GoDaddy (or any infrastructure provider) get, however many data centres they have around the world, they are still open to attacks, politics and human error inside their organization. We're service providers building on top of their infrastructure. It is our job to accept their limitations and do something about it.