Showing posts with label language design. Show all posts
Showing posts with label language design. Show all posts

Thursday, September 13, 2012

<- vs. = in R

One of the first thing to strike a programmer new to R is <- all over the place. "Ugly!" may be a first reaction. But probe a bit deeper and you'll find you can use = just as well. "So why bother with <- at all?!" is a common reaction.The most pragmatic advice I've seen from experienced R programmers is it really just comes down to personal preference.

As I'm coming from experience in C++, PHP, Javascript, python, java, (the list goes on), I use = in my R code. Except in one case, which I'll come to in a moment. I've used = in almost all my R-related blog posts and StackOverflow questions and answers and no-one has ever taken offence (AFAIK), shunned me (AFAIK), or done an edit to change them all to <-. So it appears to be acceptable.

Downsides


But there are three downsides I can think of, one important, one social and one bogus.

The first downside is R also uses the single equals sign for a named parameter assignment in a function call. This generally doesn't matter because using assignment in parameters is bad form in C-like languages, so I don't get confused. Just about the only time it ever matters to me is timing code. I *have* to write:

     timing = system.time( x <- do_calculation(a=1,b=2) )


If I write the following then x won't get assigned:

  timing = system.time( x = do_calculation(a=1,b=2) )

The second downside is the R community considers <- to be standard. So all packages use it, all books use it. If you want to be part of the in-crowd you have to use it too.

The third downside is it is easier to use search-and-replace to convert all your "<-" to "=", than it is to go the other way. But this is suspicious: the above example shows why you cannot do "<-" to "=" completely automatically anyway.

Upsides

Are there any upsides to preferring = to compensate? Yes, though they'll sound petty to people who believe using <- is the Only Way. First it is one less keystroke. Second, in comparisons, this does not work:

    if(x<-5)cat("x is less than minus five\n")

So you must put spaces around < and > in R; it is not just a style thing as it is in other languages.

The third reason is when I'm using <- it is communicating intent: I'm deliberately doing an assignment to a variable in the parameter list of a function call. As that is considered bad form in most languages, I like how it stands out.

I've saved the biggest upside for last: familiarity to programmers coming from any of the other C-like languages. (R is a C-like language too.)

Comparisons

You thought I'd mention ==, and how it can be confused with = ? That potential confusion exists in all C-like languages, and we just know to look out for it. And, anyway, it still exists in R:

   f(x=1)   vs.    f(x==1)

When I write one of those, did I mean the other?

Language Design

If I was designing R from scratch, how would I do it differently? I love how I can assign to named parameters in R - it is perhaps the most beautiful feature of the language. But it is an overload of the = sign, and one not found in other C-like languages, so I'd be tempted to change it to use ":" (it looks a bit like object notation in Javascript, but without the curly brackets). So the above example would become:

   timing = system.time( x = do_calculation(a:1,b:2) )

Notice how I can use "x=" safely, because it now has no other meaning. Also notice how this helps with the f(x=1)   vs.    f(x==1) confusion too. But I'm contradicting my comment above, about liking the way <- stands out. So maybe I want it to look like this:

   timing = system.time( x <- do_calculation(a:1,b:2) )

Now a lint tool can warn about use of a single equals sign in a function call, because there should never be one. Hhhmmm, I'm not convinced but it is something to chew on.

Do you have any thoughts or constructive criticism? Please add a comment.