Thursday, September 13, 2012

<- vs. = in R

One of the first thing to strike a programmer new to R is <- all over the place. "Ugly!" may be a first reaction. But probe a bit deeper and you'll find you can use = just as well. "So why bother with <- at all?!" is a common reaction.The most pragmatic advice I've seen from experienced R programmers is it really just comes down to personal preference.

As I'm coming from experience in C++, PHP, Javascript, python, java, (the list goes on), I use = in my R code. Except in one case, which I'll come to in a moment. I've used = in almost all my R-related blog posts and StackOverflow questions and answers and no-one has ever taken offence (AFAIK), shunned me (AFAIK), or done an edit to change them all to <-. So it appears to be acceptable.

Downsides


But there are three downsides I can think of, one important, one social and one bogus.

The first downside is R also uses the single equals sign for a named parameter assignment in a function call. This generally doesn't matter because using assignment in parameters is bad form in C-like languages, so I don't get confused. Just about the only time it ever matters to me is timing code. I *have* to write:

     timing = system.time( x <- do_calculation(a=1,b=2) )


If I write the following then x won't get assigned:

  timing = system.time( x = do_calculation(a=1,b=2) )

The second downside is the R community considers <- to be standard. So all packages use it, all books use it. If you want to be part of the in-crowd you have to use it too.

The third downside is it is easier to use search-and-replace to convert all your "<-" to "=", than it is to go the other way. But this is suspicious: the above example shows why you cannot do "<-" to "=" completely automatically anyway.

Upsides

Are there any upsides to preferring = to compensate? Yes, though they'll sound petty to people who believe using <- is the Only Way. First it is one less keystroke. Second, in comparisons, this does not work:

    if(x<-5)cat("x is less than minus five\n")

So you must put spaces around < and > in R; it is not just a style thing as it is in other languages.

The third reason is when I'm using <- it is communicating intent: I'm deliberately doing an assignment to a variable in the parameter list of a function call. As that is considered bad form in most languages, I like how it stands out.

I've saved the biggest upside for last: familiarity to programmers coming from any of the other C-like languages. (R is a C-like language too.)

Comparisons

You thought I'd mention ==, and how it can be confused with = ? That potential confusion exists in all C-like languages, and we just know to look out for it. And, anyway, it still exists in R:

   f(x=1)   vs.    f(x==1)

When I write one of those, did I mean the other?

Language Design

If I was designing R from scratch, how would I do it differently? I love how I can assign to named parameters in R - it is perhaps the most beautiful feature of the language. But it is an overload of the = sign, and one not found in other C-like languages, so I'd be tempted to change it to use ":" (it looks a bit like object notation in Javascript, but without the curly brackets). So the above example would become:

   timing = system.time( x = do_calculation(a:1,b:2) )

Notice how I can use "x=" safely, because it now has no other meaning. Also notice how this helps with the f(x=1)   vs.    f(x==1) confusion too. But I'm contradicting my comment above, about liking the way <- stands out. So maybe I want it to look like this:

   timing = system.time( x <- do_calculation(a:1,b:2) )

Now a lint tool can warn about use of a single equals sign in a function call, because there should never be one. Hhhmmm, I'm not convinced but it is something to chew on.

Do you have any thoughts or constructive criticism? Please add a comment.

2 comments:

xian said...

Obviously the : character is already bound. Up until recently with S4 slots, you could have gotten away with @, which sorta makes sense:
xyplot(y~x, data@mydata).

I agree that overloading the = operator is really annoying, and that miswriting == as = is a surprisingly frequent cause of bugs, at least in my own experience. I guess I just don't really *get* why people make such a big deal of it, except for the fact that it is annoying, and so by religiously picking *one* way the cognitive dissonance is removed.

Do you know of any evidence of Chambers commenting on this particular quirk of the language?

darren said...

Regarding my suggestion to use colon to specify a named parameter, I was just reminded that this is how it is done in C#4, see http://weblogs.asp.net/scottgu/archive/2010/04/02/optional-parameters-and-named-arguments-in-c-4-and-a-cool-scenario-w-asp-net-mvc-2.aspx

(I'd forgotten this Most Excellent feature: previous C# projects I've done had been limited to C#3.5.)

However I don't think even C#4 has that other cool feature of R, which is "..." acts like a parameter. So you can pass "..." straight on to another function, and then you find yourself writing stuff like:
f=function(a,b,...,verbose=F){}