Showing posts with label financial data. Show all posts
Showing posts with label financial data. Show all posts

Monday, February 29, 2016

Factorials and Rationals in R

At the recent NorDevCon, Richard Astbury used the example of factorial to compare some “modern” languages (the theme of his talk being that they were all invented decades ago). Among them was Scheme, which impressed me by having native support for very large numbers and rational numbers.

I felt like my go-to-language for maths-stuff, R, ought to be able to do that, too. The shortest R way to calculate factorials (using built-in functions) I can find is:

prod(1:5)

which gives 120. But it gives me Inf if I try prod(1:50).

BTW, I hoped this might work:

`*`(1:5)

but * operator only takes two arguments (which is ironic for a language where everything is a vector, i.e. an array).

It seems I need to use the gmp package to get big number and rational number support. Here is how factorial can be written:

library(gmp)
prod(as.bigz(1:50))

which outputs
“30414093201713378043612608166064768844377641568960512000000000000”

That is not bad: still fairly short and, as you can see, vectorized operations are still trivial (as.bigz(1:50) creates a vector of 50 gmp numbers).

as.bigq(2, 3) is how you create a rational (⅔). Here is a quick example of creating vectors of rational numbers:

as.bigq(1, 1:50)

which outputs:

Big Rational ('bigq') object of length 50:
[1] 1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 1/11 1/12 1/13 1/14 1/15
[16] 1/16 1/17 1/18 1/19 1/20 1/21 1/22 1/23 1/24 1/25 1/26 1/27 1/28 1/29 1/30
[31] 1/31 1/32 1/33 1/34 1/35 1/36 1/37 1/38 1/39 1/40 1/41 1/42 1/43 1/44 1/45
[46] 1/46 1/47 1/48 1/49 1/50

Rational operations also work nicely:

as.bigq(1,1:50) + as.bigq(1,3)

giving:
4/3 5/6 2/3 ... 49/138 50/141 17/48 52/147 53/150

Of course that is not as nice as having built-in rationals, but good enough for those relatively few times when you need them.

Thursday, August 23, 2012

Small R/XTS Code Snippets And Tips

1. Splitting by week

The split function in xts can split your data into weeks:  split(data,f="weeks")
Under the covers it uses endpoints, which in turn uses a C function of the same name in the XTS package. The problem I discovered it that it considers the start of the week as Monday, 00:00:00, and the end of the week as Sunday, 23:59:59. This is a problem because the FX markets start at 9pm or 10pm on a Sunday night in the UTC/GMT timezone (at Sunday, 5pm in New York timezone).

What we want is for it to treat Sunday 00:00:00 as the start of the week. That gives us good buffer on both sides (about 24hrs after Nasdaq after-market hours finish, and about 21hrs before the FX markets open). If you look at the source of endpoints (just type endpoints in an interactive R session) you'll see it has the magic constant: 3L * 86400L. After some trial and error I found we want that to be 4L not 3L. Though it may be brittle, as you're using XTS internals, you can use this line in your code:

    ep= .Call("endpoints", .index(data)+4L*86400, 604800L, k=k, PACKAGE="xts")

(I don't suppose the XTS package can consider this a bug, as there may be people relying on the current "weeks" behaviour; but maybe they could add a "sunweeks" option to endpoints and split? What do you think?)

P.S. A minor optimization for the XTS endpoints function:
    if (on %in% c("years", "quarters", "months", "weeks", "days"))
could read:
    if (on %in% c("years", "quarters", "months", "days"))
because weeks is not using posixltindex, and it takes CPU time to generate.

2. Put an XTS object in a string

Say you want to show the first 4 rows of an xts object in a string. The first trick you need is capture.output(). This takes the print() output and puts it in a string. But it returns each row as an entry in a vector. So we'll use paste() to convert that to a single string. Here is our code:


    msg = paste("Here are the first 5 rows:",paste(capture.output(x[1:5,]),collapse="\n") )

See item 3 for an example of this.

3. Show me just the NA rows

Here is some test data:
    library(xts)
    x=xts(  data.frame(a=c(1,2,3,NA,5,6), b=c(100,99,NA,NA,NA,95)),
            as.Date("2012-10-01")-6:1)
Which looks like:

                a   b
    2012-09-25  1 100
    2012-09-26  2  99
    2012-09-27  3  NA
    2012-09-28 NA  NA
    2012-09-29  5  NA
    2012-09-30  6  95


Imagine there should be none, and you want to print them out in an error message. This is the command to just show the rows with no NAs:

    x[complete.cases(x),]

And therefore just the rows with  NAs:


    x[!complete.cases(x),]

So, to print a fatal error message that shows the NA rows:

    if(any(is.na(x))){
      stop( paste("We have NAs:",

        paste(capture.output(x[!complete.cases(x),]),collapse="\n")
        ))
      }


(see Item 2, "Put an XTS object in a string", if the second half of that line looks scarier than a rabid dog who has just bitten through his leash.)


4. Loop through as key=>value


In many languages there is a foreach(container as key=>value) type construct. I've not found something so compact for XTS objects, so I use this:
    for(ix in 1:nrow(x)){
      datestamp=index(x)[ix]
      b=x$b[ix]  #Or, b=coredata(x$b)[ix]

      #print(datestamp);print(b)
      }

I've shown two ways of setting 'b'. The first way leaves 'b' as an XTS object. In the second way 'b' is a number. If using the latter, I'm sure it is more efficient to do the coredata() call before the loop. I.e. like this:
    xcd=coredata(x$b)
    for(ix in 1:nrow(x)){
      datestamp=index(x)[ix]
      b=xcd[ix]

      #print(datestamp);print(b)
      }

 

5. Merging, but excluding values only in one xts object

Here is some test data:
  library(xts)
  Sys.setenv(TZ = "UTC")
  d=xts(1:5,Sys.Date()+(2:6))
  ix=Sys.Date()+(1:5)

So, d is our data, whereas ix lists the days we should have data for.
ix looks like:
  [1] "2013-01-10" "2013-01-11" "2013-01-12" "2013-01-13" "2013-01-14"

and d looks like:
             [,1]
  2013-01-11    1
  2013-01-12    2
  2013-01-13    3
  2013-01-14    4
  2013-01-15    5


In other words, d is missing an entry for 2013-01-10, so we need to add an NA entry for it. But also d has an entry for 2013-01-15, which it shouldn't have yet. (In a finance context it might be a bar that we are still collecting ticks for, so we don't have final values for yet; in a business context it might be a value that has not been approved by management for release yet.)

When we do merge(d,xts(,ix),all=F) we get:
             d
  2013-01-11 1
  2013-01-12 2
  2013-01-13 3
  2013-01-14 4



When we do merge(d,xts(,ix),all=T) we get:
             d
  2013-01-10 NA
  2013-01-11  1
  2013-01-12  2
  2013-01-13  3
  2013-01-14  4
  2013-01-15  5

Neither is what we want. The solution is  merge(d,xts(,ix),join='right') which gives us:
  2013-01-10 NA
  2013-01-11  1
  2013-01-12  2
  2013-01-13  3
  2013-01-14  4


6a. Combining two xts objects that have same column but different timestamps

Here is some test data:
  library(xts)
  Sys.setenv(TZ = "UTC")
  a=xts(1:5,as.Date("2013-02-01")+1:5);colnames(a)="v"
  b=xts(0:-2,as.Date("2013-02-01")-0:2);colnames(b)="v"

When we do merge(a,b) we get:
             v v.1
 2013-01-30 NA  -2
 2013-01-31 NA  -1
 2013-02-01 NA   0
 2013-02-02  1  NA
 2013-02-03  2  NA
 2013-02-04  3  NA
 2013-02-05  4  NA
 2013-02-06  5  NA


You can play with various values for the join parameter, but it won't help. The solution is  rbind(a,b):
             v
 2013-01-30 -2
 2013-01-31 -1
 2013-02-01  0
 2013-02-02  1
 2013-02-03  2
 2013-02-04  3
 2013-02-05  4
 2013-02-06  5


What if both a and b have a timestamp in common? Then you get two rows. In that case you can remove the duplicates afterwards:
  x=rbind(a,b)
  x=x[!duplicated(index(x)),]

The duplicate from b is the one that gets removed. If you wanted the one from a to be removed instead, then pass fromLast = TRUE to duplicated(), or simply do rbind(b,a) !

6b. Combining two xts objects that have same columns but different timestamps

This follows on from 6a, but when the xts object have more than one column. Here is some test data:
  library(xts)
  Sys.setenv(TZ = "UTC")
  a=xts(data.frame(x=1:5,y=5:9),as.Date("2013-02-01")+1:5)
  b=xts(data.frame(x=0:-2,y=10:12),as.Date("2013-02-01")-0:2)

Again,  rbind(a,b) is the answer: 
             x  y
 2013-01-30 -2 12
 2013-01-31 -1 11
 2013-02-01  0 10
 2013-02-02  1  5
 2013-02-03  2  6
 2013-02-04  3  7
 2013-02-05  4  8
 2013-02-06  5  9