Darren's Developer Diary: 2011

Wednesday, November 16, 2011

boost on centos (vs. on ubuntu)

You'd think porting a C++ program from one 64-bit linux to another would be trivial. But, no. A program developed with no issues on Ubuntu 10.04 was a lot of trouble to get to compile on Centos 5.6.
But get it to compile I did... read on for the secret words you have to utter...

First thing I did was:
yum erase boost boost-devel

This got rid of the very old 1.33 library. I then ran this:

  yum install boost141 boost141-devel boost141-program-options boost141-regex boost141-thread boost141-system

I then had to hack the makefile to add this to my CFLAGS:

  -I/usr/include/boost141/

That got me compiling. But not linking. I changed my LDFLAGs to look like this but still it would not link:

  -L/usr/lib64/boost141 -lboost_regex -lboost_program_options -lboost_system -lboost_thread

I played around with various -L settings, until I had a breakthrough. I noticed the above line complained with:

  /usr/bin/ld: cannot find -lboost_thread

But some different -L settings instead gave me:

/usr/bin/ld: cannot find -lboost_regex

Blink and you miss it, and indeed I had. My above line was linking with most of Boost, just not boost_thread. In other words, I was really close and hadn't realized it. A bit more poking around discovered that Ubuntu calls it "boost_thread", while Centos calls it "boost_thread-mt" (careful, the first one is an underline, the second one is a hyphen). So, the final magic line for Centos was:

  -L/usr/lib64/boost141 -lboost_regex -lboost_program_options -lboost_system -lboost_thread-mt

NOTE: I did not do any post-installation steps after the yum install steps. I saw some people suggesting copying files from one place to another but I did not need to. (Also, -L/usr/lib64 should be sufficient, as every thing is symlinked; maybe even the -L flag is not needed at all... but I'm in the "If it ain't broke." mindset now, so no more experimentation for me!)

Saturday, November 12, 2011

Actual costs: rackspace cloud

A few months back I decided to put a 24/7 script on a Rackspace Cloud instance, instead of the more obvious Amazon EC2 choice. The reason at the time was my needs were low CPU but relatively high bandwidth and diskspace usage and it worked out cheaper.

Now I've had a few invoices in I am relieved to say there was no catch. My past three invoices have been $11.99, $11.99 and $12.20 (USD). This is for a minimal CPU spec (256MB, 1.6% of a quad core CPU, 10GB disk), 1.1 to 1.3 GB/month of outgoing bandwidth each month (there is no charge for incoming bandwidth), and cloud storage rising from 4 to 8GB. 90% of the monthly cost is for the machine, and the cloud storage has risen from $0.63 to $1.15. The bandwidth is not costing much at all.

In contrast on Amazon EC2, the micro instance would cost $15.65 (including $1 for 10GB of EBS storage), while a small instance would cost $62.25/month, of which $0.03 is the bandwidth usage. (The first year of that micro instance would be free if you are a new customer, but I am not.)

So, at the CPU bottom-end, Rackspace is winning on cost. The other feature of Rackspace Cloud that I love is there is an automatic daily backup of the full disk image, and that backup is stored in the cloud storage. (Storing that backup is basically all my $1/month cloud storage costs.)

What do I not like? I keep using up my 10GB disk space. But there seems no way to move to 20GB without doubling the CPU spec and doubling the monthly cost; with Amazon micro I'd just increase the EBS storage space. With an Amazon small instance I'd get 160GB and would not care.

What do I not like about Rackspace and Amazon? It is that you just get a basic linux distro. You have to spend time installing, configuring and maintaining. And the configuration is not trivial; I've kept a log of all I've had to do, and it includes things like moving ssh off of port 22, setting up an iptables firewall, installing a mail server (not a POP server, just enough so I can *send* email alerts), and writing my own low-diskspace email alert script. The latter was done just the other day after my application broke, yet again, because the machine had run out of disk space.

P.S. As I want 24/7, and have not mentioned the need to scale, what about cheaper shared hosting? Well, I couldn't find a VPS, that gives me root access and no restrictions, for under $10/month. It seems Rackspace is winning that fight too?

(2012-08-15 UPDATE: Rackspace are stopping their minimal server config: no more 256MB option in their "nextgen" V2 API. Also scheduled images are not available in the V2 API (yet). In other words the two things I pointed out that were good, in the above article, are going! I guess the Rackspace marketing department will have to work out a positive spin on "cutting out our competitive advantage so we look just like the competition now" ;-)

Sunday, November 6, 2011

PHP, Proxies, HTTPS: v2, v3 or v23?!

From a PHP http client, using HTTPS via a proxy, I started getting a "400 bad request" error from Apache. I knew it could work because it worked last week. The apache error log message was:
Hostname 127.0.0.1 provided via SNI and hostname mytest.local provided via HTTP are different

My first troubleshooting mistake was messing around with server-side settings: I found out what SNI meant, but as far as I could see I wasn't using it. Then, finally, I remembered I could use curl as a test http client, and it was working fine. I removed all my server-side changes and curl was still connecting fine. So now I knew I'd broken something client-side. I added the -v flag to curl to see the exact headers it is sending. We're both sending the same headers.

Finally I remembered I'd changed this line:
stream_socket_enable_crypto($fp,true,STREAM_CRYPTO_METHOD_SSLv23_CLIENT);

The PHP docs give no guidance on which option to choose, but "v23" sounded like it would work with version 2 or version 3, and maybe do all kinds of auto-negotiation behind the scenes. Which had to be a good thing. I'm sure I'd tested after changing, but I must have tested non-HTTPS or without the proxy by mistake. When I changed back to this line, everything worked again:
stream_socket_enable_crypto($fp,true,STREAM_CRYPTO_METHOD_SSLv3_CLIENT);

I hope that helps someone, as google was no use for me (all the hits about the SNI name and HTTP name difference were due to Apache being case-sensitive about the name comparison, which was not the problem here).

By the way if you want to know how to do http connections, with a proxy, using PHP, supporting both HTTP and HTTPS, I've described it here.

Friday, October 28, 2011

"Git" into this good habit

This time no story giving insight in to my life as a superhero developer; instead straight to the facts. When you write a .gitignore file, precede each entry with a forward slash. So instead of writing:
*.zip
You should write:
/*.zip

It means zip files only get ignored in the same directory as the .gitignore file, and not in each subdirectory. Do this on principle, and optimize later if you find it is an extension you really want ignoring project-wide. (In other words wait for the noise to appear before using .gitignore to suppress noise.)

Especially important for websites: I may have a temporary zip file in the root of the website from a designer, but deeper in the site I may have zip files that users are supposed to download.

Saturday, October 22, 2011

Frapi (PHP web service API system)

I read about Frapi in PHP Architect (May 2011), and spent a couple of hours trying it out. It is quite interesting, but I don't think I will be using it. It is a full web-interface for making the API. This is what makes it cool, but also its biggest disadvantage. There is a lot of code involved, meaning there is a lot to learn if you need to change it and lots of places for bugs and security exploits to crop up.

It comes with a documentation generator, which could be really useful. This feature is still incomplete (for instance there are no links to it yet, see here, PDF generation didn't work properly), but it looks okay.

There is one specific limitation: I could not create an action with an optional parameter, at least not using the router. E.g. if my action is called "ddd", then I can call /ddd (no parameter). But I cannot call "/ddd/77". (I can give an optional parameter with /ddd?id=77). Or I can define a router as "/ddd/:id" so that I can call "/ddd/77". But in that case id is now required and I cannot use "/ddd".

Another disadvantage is no built-in support for oAuth. (I did find an oauth extension for Frapi but did not try it as the integration seems quite rough still.)

Incidentally if you were looking for a full application built on Zend Framework this may make the perfect study case. Apparently only the admin interface uses ZF, and the actual web services do not; but as far as I can tell they are closely tied and you need to keep both together even on your production servers.

Overall, because making a web service in PHP is not that hard, the advantages are slight and not enough to outweigh the disadvantages (large codebase, inflexible structure, etc.).

Thursday, October 20, 2011

R: what to do when the jitter gets too much

The subject sounds like this will be a discussion of an over-full bladder... instead, brace yourself for some serious functional programming.

So, I had this little bit of R to generate a "rough" exponent curve (as test data):

  x=1:40

  y=exp( jitter( x, factor=30 ) ^ 0.5 )

The problem was it was sometimes giving NA values - too much jitter!
I started off with this fix:

  y[is.na(y)]=exp( x^0.5 )

I.e. wherever it was an NA use the un-jittered value.

Unfortunately it was often complaining with "number of items to replace is not a multiple of replacement length". In this case this was not a warning to ignore, because it revealed I was doing something very wrong. exp(x) is a vector of 40 items. y[is.na(y)] is a vector of however many entries in y are NA. The dangerous thing is if there is just one NA in y, but it is the 5th element, it will be set to exp(1)^0.5, instead of exp(5)^0.5.

Let's cut straight to the solution:
y[is.na(y)]=exp(x[is.na(y)]^0.5)

x[is.na(y)] means the values out of x where the same-sized y-vector has a NA in that position. So, for the above example where just the 5th element in y is NA, then x[is.na(y)] will be "5".

Simple when you know how. (?!)

Tuesday, October 11, 2011

Renaming files is easy!

What do you mean? Of course renaming is easy! From the GUI just click it and type the new name. From the linux commandline just type mv oldname newname.

But what about when you have 200 *.JPEG files and you need to rename them to be *.jpg? In the past I've written throwaway PHP scripts to do just this. Well, it turns out linux has a command called rename, that can use regexes. To rename those jpegs:

  rename 's/JPEG/jpg/' *.JPEG

What about a few thousand log files, and I want to remove the datestamp portion from the filename (because I'm going to keep each day's logs in their own subdirectory)? How about this:

  mkdir 20110203
  mv *.20110203.log 20110203/
  rename 's/.20110203//' 20110203/*

(See here for how to then use a bash loop to loop through all the datestamps you have to deal with.)

Saturday, September 17, 2011

Add a temporary static IP address

At home, with wired ethernet, my (Ubuntu) notebook has a few static IP addresses that I use for developing websites. Out of the house, I use wicd, so I have a dynamic IP address, and those static IPs don't exist. wicd configuration is too complex for me to understand, so I just accept this, but it caught me short the other day when I needed to have both an internet connection and to be able to work on a website running on my notebook.

I failed then, but I'm ready for next time. To temporarily add a static IP address you simply do (as root):

ifconfig eth0:3 10.10.10.10 netmask 255.255.0.0

I'm choosing "eth0:3" for the interface; it can be any unused number after the colon, and you never need to care what this is. netmask can really be anything for our purposes. The 10.10.10.10 is the IP address I've given it. Test with this:

ping 10.10.10.10

To set up a quick virtual host create a file under /etc/apache2/conf.d called 10.10.10.10.conf (any filename is fine) with these contents:

<virtualhost 10.10.10.10:80="">
    DocumentRoot "/var/www/somewhere"
    ServerName 10.10.10.10
</virtualhost>

Tidyup

To remove just the interface that you added above, use this command:

ip addr del 10.10.10.10/32 dev eth0:3

Or, to restore the network to boot defaults (useful if you have done lots of changes) you can do:

ifdown -a
ifup -a

Either way to then remove the apache config: delete the 10.10.10.10.conf file you created and restart apache.

Monday, September 12, 2011

node.js: Good Tutorial

Chatting with a friend at the weekend, node.js came up; I'd only vaguely heard of it, but apparently it is what all the kewl kids are into. Server-side javascript, right up my street.

I took a very quick look yesterday, and it sounded interesting: especially for speed-critical websites. So, today I took a deeper look. First up you should know the official website has a documentation link that only goes to the API documentation. In real terms that means node.js is officially undocumented. Then from the Wiki I found a link to a free e-book called Mastering Node. I won't give the link as it is (currently) over-priced: poorly organized and unfinished.

I was getting a bit demoralized when a search found http://www.nodebeginner.org/. Now this is more like it. In fact this is an excellent tutorial, that goes right from raw beginner to a reasonably complex mini-application. (I read it in HTML format, but as an ebook it is 60 pages, so you can get an idea of how involved it is going to get.)

I followed along, and it was fairly easy, though I did have a lot of trouble outputting the POST-ed data. It always said "undefined" in my browser. I could do a console.log() of the variable just before and just after writing it to the browser and it was set correctly. I cleared the cache repeatedly, and tried a second browser. Annoyingly I didn't solve that problem in the end.

The main point of this blog post is to recommend the above tutorial/ebook if you're interested in getting a feel for node.js, but What Do I Think About node.js?

Hhhmmm. First and foremost, it should only be used by expert programmers. It is a bit like C++ in that it is going to be easy to shoot yourself in the foot. Asynchronous programming is hard. But if you use a synchronous programming-style you will lock up the whole server. I'm thinking in terms of the web server example application here (which is the use-case I had in mind for it). Asynchronous programming is hard. Yes, I know I already said that but I don't think you thought through what that means in the Real World. What will happen is programmers will take a short-cut: they'll use little bits of synchronous code for jobs they know are so quick that no-one will notice. (Even the above tutorial does this: fs.renameSync) The problem is any job involving any I/O device (like a hard disk file system or a TCP/IP socket) will take 10 times longer to finish than average, about 1% of the time. (I made that statistic up, but the principle is true, so stay with me...)

What does that mean? When that happens it will lock the whole web server up, and all the other requests will block. Take the extreme case of doing a sync action that results in a time-out because the resource has gone offline. If the time-out is 30 seconds, the whole web server is down for 30 seconds. All of your customers are getting time-outs. Every image comes up broken.

Another nice thing about the Node Beginner tutorial is the links to deeper information... and nested in one of the comments is a link to a paper comparing threads and events: Why Events Are A Bad Idea Well, their conclusion is in the title, but if you look at their charts the important thing to learn is that well-written event handling code and well-written threading code are basically as quick as each other. (You won't ever reach the right-side of the charts in a real website on a single server; their example is just serving a static image. So the differences are only of interest to academics.)

But, although dealing with threads is hard, asynchronous programming is even harder, IMHO.

Now, if node.js had a web server module where it maintained a thread pool and each new request got its own thread, then I could program in a synchronous style in my thread, happy in the knowledge I won't break the web server, and also happy I'll be able to meet my deadlines...

Friday, September 2, 2011

Using twitter oauth from commandline

The twitteroauth library is the most recommended PHP library for using oauth for API access to Twitter. It also supports the commandline approach (which can work completely behind a firewall, no need for a web server to host a callback page), but it is not very well documented.
(Note: when I say behind a firewall you do still need web access, because you need to login to twitter to get a PIN code; but this is much less demanding than needing to set up a web server on a public IP address.)

Anyway, after I'd worked it out, I put my sample code up on github. The three files to look at are oob1.php, oob2.php and oob3.php. Here is how you use them:

Step One:
Edit config.php, to set:

      define('OAUTH_CALLBACK', 'oob');

(If not already done, go to https://dev.twitter.com/apps, create and configure the app, and add the consumer key and secret to config.php.)

Step Two:

   php oob1.php

Step Three:
Visit the URL it tells you to, and approve the application

Step Four:

   php oob2.php 1234567

(where 1234567 is the PIN number you got at the end of step three)

Step Five:

   php oob3.php account/verify_credentials

(this command will show your account; see the oob3.php source code for other supported commands, and some shortcuts.)

Let me know if you have any problems with, or questions about, these files.
If you find bugs, or want to encourage its inclusion in the main twitteroauth library, you can comment on the github pull request.

Tuesday, August 9, 2011

Debugging Regexes

Cue: dramatic music. There I was under pressure, enemy fire going off in all directions, and my unit test had started complaining. The test regex was 552 characters long, the text being tested was almost as long, and each run of the unit test takes 30 seconds. Talk about finding a needle in a haystack. James Bond only had to choose between cutting the red or the blue wire. He had it easy.

But I lived to tell the tale. Playing the Bond Girl in this scenario was http://www.regextester.com/ (I actually used version 2 which, though alpha, worked fine).

It still wasn't smooth sailing. The above site assumes the regex is surrounded by /.../ but mine wasn't. So, first I had my unit test output the regex, then I escaped it correctly for use with /.../ then pasted it into the Regex Tester. I also pasted in the text to test. It should match; it doesn't. So I put the cursor at the end of my regex and deleted a few characters at a time. After deleting about two-thirds (!!) of it, finally the text turned red and I had a match. I could see exactly where the match stopped and realize what was missing in my regex. I fixed the regex (simultaneously in RegexMatcher and in my unit test script) and repeated. I had to delete back to almost the same point. It took half a dozen passes before the whole regex matched.

The code looks to be open source javascript. So maybe I will hack on it, to automate the above process (my better Bond Girl, if you like): I would give the regex, the target text, say I expect a match, and it will find the longest regex that matches and show me how much of the target text got matched. (Ah, it uses ajax requests to back-end PHP for the preg and ereg versions, and that code is not available; but at least I could do this for javascript regexes.)

Enough with poking around inside today's Bond Girl. Down the cocktail, jacket on, back to the field...

Monday, August 1, 2011

svn, ssh, /bin/false, git, /etc, etc.

I just spent almost two hours troubleshooting an svn server. It was set up to allow ssh access from a few users over ssh as described in this helpful blog post.

Today I realized none of the svn clients could run svn update: "svn: Network connection closed unexpectedly"

Part of the challenge was that I'd not run "svn update" in 1-2 months, so I had a big time frame for breakage. But all I could remember changing recently was commenting out some lines in my iptables firewall, so I spent quite a while staring at that. And I'd updated some packages and rebooted.

I spent a lot of time looking at ssh verbose output. To do this use

export SVN_SSH="ssh -vv"

(-v for quite verbose, -vv for more, -vvv for even more, etc.) No error messages: it connects, runs the svnserve command, passes up some environmental variables, sends some data, and then simply loses the connection.

Then I wondered if the svn user had somehow lost permission for something important. It has /bin/false as its shell, so I gave it /bin/bash, used su to become root, then "su - svn". Ran the svnserve command. It is fine. Looked at the repository. It is fine. Ran svnadmin to check for locks. None.

Giving up, I started crafting an email to post somewhere asking for help. I went back to get the exact error message svn update was giving me... and it worked. Yep, the other client works now too. My first guess was running svnadmin had quietly fixed something. But then, on a hunch, I changed svn's shell back to /bin/false. Yep, that broke it. It seems the svn user has to have a valid login shell.

Okay, problem solved. But I'm sure that svn has always had /bin/false, and now I wanted to know if that is true, and if and when it changed. It is too late now, but ready for next time, I decided to put all of /etc into a git repository (no need for a central repository, so git is far better choice than svn for this). The git commands (all run as root) are trivial:

cd /etc
git init
git add .
git commit -m "Initial files" -a

I found this page of someone who has done something similar, and he suggested sending the git status report from a daily cron, so I did that too.

It is so easy, and disk space so plentiful, that I think I will do it on all my linux machines.

UPDATE: /etc has some files that get edited a lot, so to reduce noise this is my .gitignore file so far:

/cups/printers.conf*
/cups/subscriptions.conf*
/emacs/site-start*
/mtab
/adjtime
/ld.so.cache
/*-
*~

(Use "git rm --cached cups/printers.conf*" (for instance) to stop git tracking the files if they've already been added.)

Sunday, July 31, 2011

Rcpp: embedding C++ in R scripts

Rcpp is an R package to aid in making R extensions written in C++. It also includes a side-project for embedding R code in a C++ program (including being able to use all the R packages), but most amazing of all is the integration with R's inline package. Let me show you a minimal (but complete) script:

library('inline')
src='Rcpp::List output(1);output[0]="Hello World";return output;'
fun=cxxfunction(signature(),src,plugin="Rcpp")
fun()

Yes, it is the classic Hello World script, the output looks like:

[[1]]
[1] "Hello World"

What do I call this amazing? The contents of src is the body of a C++ function. The cxxfunction takes your C++ code, sends it to your C++ compiler, compiles it with the necessary headers, and finally embeds it in your R session as an R extension. It just works, there is no catch (*).

(I could have used a character object instead of a list. The Rcpp::List is just like an R list, meaning each element can be of a different type. If you've always wanted something like this in C++, also take a look at boost::any.)

The cxxfunction() call takes 1.9s to run on my machine, so the overhead to start-up the compiler and run the compilation process is not too bad.

Okay, not a motivating example, but the Rcpp project is not just technically amazing, it is also well documented. You can find some more interesting examples in the documentation, and if this has got your interest you will enjoy this video (1.5hrs) of a talk the Rcpp main authors did at Google.

By the way, if anyone knows of a PHP project for embedding C++ like this, please let me know!

*: You need to install the Rcpp and inline packages, and you need R version 2.12.0 or later. If you are using Ubuntu and your default version of R is too old, there are good instructions here on getting the latest R version as a deb package.

Sunday, July 17, 2011

R: extract column from xts object as a vector

A quickie for the R language... you will look at the answer and think it is hardly worth writing a blog post over. Well it turns out that everyone has thought the same... which is why it took me over half an hour of failed searching and trial and error to work it out.

I've an XTS object, x, and I've added a column which is the sum of the other columns:

  x$sum=apply(x,1,sum)

print(x$sum) shows me the sum for each row, along with the datestamp of that row. I.e. x$sum is still an XTS object. Normally this is good, but I wanted it as a simple vector, without the dates associated. Here is how you do that:

  as.vector(x$sum)

Why did I want that? Simply because summary(x$sum) uses up 7 lines, half of it being noise about the datestamps; summary(as.vector(x$sum)) is 2 lines, all signal, no noise.

Thursday, July 14, 2011

Customize less: less annoying

Open ~/.bashrc (or ~/.bash_profile) and add this line at the end:

LESS=-FRX;export LESS

Don't ask why, just trust me. ('cos actually I don't know what it means either...)

This was the magic incantation to tell "git log" to wrap commit messages (instead of just chopping off everything extra).

But it has had the delightful side-effect of man now works properly. About 6 or 7 years ago I used to be able to type "man whatever", press q, and the advice would stay on screen. Then linux (all distros as far as I could tell) changed to hiding the man page as soon as you press q. Frequently I'd have to have two terminal windows open, simply so I could keep the man page open while I type my command. Yes, you're ahead of me; the above LESS command fixes this. If I'd known it would be so easy I'd have done this years ago!!

Even better, the help system in the R language was working just like man pages (and it was even more annoying there), and that has been fixed too.

It is a miracle, I tell you, a miracle! July 13th should go down as "Saint LESS=-FRX" day; in 100 years time it will be a public holiday and children will be taught about it in schools. They might even be taught what it means...

Monday, July 4, 2011

R: foreach parallelization

I was experimenting with the foreach and doMC packages in R, which make your code multi-threaded. It actually took a bit of refactoring to use foreach(), as there seems to be no shared memory between the threads - they each took their own copy of the in-scope variables. So I had to return my results as a one-row data frame from each iteration, and combine them when the foreach loop had finished.

Here are my results; the user CPU column was giving me strange numbers, so I'm just going to list total time spent (wall clock time). I have three different loops, so here are the base timings (running the foreach loop in sequential mode, without doMC loaded) for each:

4.33         7.72         16.14

My CPU has 4 cores, but the OS sees it as 8 virtual cores. Here are my results for 1, 2, 4 and 8 threads:

1   4.36         7.90         16.20
  2   2.50 [2.18]  4.30 [3.95]   8.80 [8.10]  [8-15% slower]
  4   1.46 [1.09]  2.50 [1.98]   5.06 [4.05]  [25-35% slower]
  8   1.32 [0.55]  2.30 [0.99]   4.30 [2.02]  [110-140% slower]

All times are in seconds, and this loop represents most of the time spent in my script, so while the results are a long way from linear, they represent a useful speed-up. The numbers in square brackets show the speeds if I had got linear improvement.

By the way, my foreach loop had 200 to 250 iterations. The above results tell me that when each foreach loop iteration does more work we get better efficiency. This is fairly course parallelization, which suggests to me that there is lots of room for improvement in the doMC() code.
UPDATE: When running with top, and cores set to 4, I notice 4 new instances of R appear each time it hits the foreach loop, and then they disappear after. I.e. it appears to be using processes, not threads, and creating them on-demand! No wonder my results indicate so much overhead!

Sunday, July 3, 2011

Apache: use both PHP module and PHP-CGI

I had a need the other day to configure apache so it uses the PHP Apache module for all directories except one, where I wanted it to use the cgi version of PHP. The apache module is more efficient, but runs as part of the apache process. I wanted one URL to run in its own process. (I was troubleshooting and had a hunch this might help; luckily it did in this case!) Instructions for both Linux and Windows follow.

On Ubuntu I needed some preparation: 1. install the php5-cgi package (different from php5-cli, which is what we use when using php from the commandline); 2. enable the actions module.
On Windows I was using xampp, which follows an Everything-But-The-Kitchen-Sink philosophy, and all I needed was already there.

Here is the code I added (to my VirtualHost). First for Ubuntu:

ScriptAlias /usephpcgi /usr/bin
Action  application/x-httpd-php-cgi  /usephpcgi/php5-cgi

And then for Windows XAMPP:

ScriptAlias /usephpcgi C:/xampp/php/
Action  application/x-httpd-php-cgi  /usephpcgi/php-cgi.exe

(That is copy and paste from working configurations, so I think the trailing slash must be optional!)

Then for both Apache and Windows:

AddType application/x-httpd-php-cgi .phpcgi

Now, I have to admit my dirty secret: I cheated. Instead of enabling php-cgi for all *.php in one directory, I left *.php going to the Apache PHP module, and I created the *.phpcgi extension to use the cgi binary. Initially this was simply because I managed to get it working that way; but on reflection I realized I preferred it: I can switch a script between using php module and php-cgi just by changing the extension; also I can use php-cgi anywhere in my VirtualHost. If that does not sound so useful I should explain the script in question is already hidden between an Alias, something like this, so no public-facing URLs need to change:

AliasMatch ^/admin/(.*?)/(.*)$ /path/to/admin.phpcgi

What about my original plan to configure it for a directory, without changing the file extensions? I had trouble with this, and gave up on this; but Ben kindly left a comment, so now I know how to do it. First, you still need the ScriptAlias and Action directives shown above. Then it is simply this.

<Directory /var/www/path/to>
    <FilesMatch "\.ph(p3?|tml)$">
        SetHandler application/x-httpd-php-cgi
    </FilesMatch>
</Directory>

As Ben explains (see comment#1 below) the reason we need the FilesMatch inside the Directory is because mod-php is setting a global FilesMatch directive; and that takes priority over our attempts to use AddType or AddHandler for a directory.

Tuesday, June 21, 2011

R: removing columns in bulk (xts, matrix, data frame, etc.)

I've an R object, z, with a number of intermediate-working columns I want to delete. In this case it is a quantmod object, but the solution I will show applies for xts and zoo objects, and also for matrices (which is what xts, etc. are under the surface). It also applies to data frames.

(By the way, I use = for assignment, not <-, in the below.) Here is the long-winded approach; setting a column to NULL removes it:

  z$close1=NULL
  z$close2=NULL
  z$close3=NULL
  z$close4=NULL
  z$close5=NULL
  z$open1=NULL
  z$open2=NULL
  z$open3=NULL
  z$open4=NULL
  z$open5=NULL

However using a string for the column name won't work. In other words, these don't work:

  z['open1']=NULL   #This one does work for data frames though.
  z[['open1']]=NULL

The solution I eventually worked out is:

  z=z[, setdiff(colnames(z),c('close1','close2','close3','close4','close5','open1','open2','open3','open4','open5')) ]

colnames(z) gets the current list of columns.
I then use setdiff() to subtract from that the list of columns I want to remove.
The z[,...] syntax means take a copy with just these columns (keeping all rows).

If you don't understand my motivation, I should first say I had more like 30 fields, not just the 10 shown in the example above. But also I like doing things programmatically, as it can avoid introducing typos. For instance, the above solution could be rewritten as:

  #Remove the openN and closeN columns
  z=z[, setdiff(colnames(z),c(paste('close',1:5,sep=''),paste('open',1:5,sep='') )) ]

Thursday, June 16, 2011

PHP PDO: so hard to debug

I wrote a simple PDO helper function to update fields in a certain database table. The fields are given as the key/value pairs in $d, and my function looked like this:

$q='UPDATE MyTable SET lastupdate=:lastupdate';
foreach($d as $key=>$value)$q.=', '.$key.'=:'.$key;
$q.=' WHERE username=:username';

$statement=$dbh->prepare($q);
foreach($d as $key=>$value)$statement->bindParam(':'.$key,$value);
$statement->bindParam(':lastupdate',date('Y-m-d H:i:s'));
$statement->bindParam(':username',$username);

It all looks reasonable doesn't it? Create the SQL, then assign the values to. But it didn't work. My $d array looked like:

array( 'status'=>'expired', 'mode'=>'' )

Instead of getting set to expired, the status field ended up blanked out. Yet lastupdate and username got set. This had me scratching my head for ages.
PDO has a debug function that is next to useless: it tells you the parameters, but not the values you've assigned to them. Incredibly annoying.

Have you spotted my bug yet?

Here's the answer. Though all the examples in the PHP documentation use bindParam(), the function to assign a value is bindValue(). You should always use bindValue(), unless you actually need the advanced functionality that bindParam() gives you. What advanced functionality you wonder? Instead of assigning the value immediately, it attaches a reference, and uses the final value of that reference variable. You're ahead of me: in my foreach loop the $value variable changes on each iteration.

If PDO had a decent debug function I'd have discovered that in half the time. Oh well, now I know!

Thursday, May 26, 2011

svn to git: central repository setup

The best thing about git is you can just start working; no messing around setting up the central repository, working out secure access, etc. But the learning curve for git, coming from svn, is much bigger than say from cvs to svn. The thing I'd not got my head around was what do you do when you need that central repository? For instance, when you have a 2nd developer! I think I've got it now, so here is my guide on it.

Let's assume it is a website ("mysite"), and we have three machines involved:
devel: where most development is done
store: the central repository server
www: the web server

The website is on devel currently, under git, and we want to check it out to www.
If not a website, but just two developers, then in this example "www" is the machine of that second developer.

For simplicity we'll assume all developers have ssh access to the "store" machine. And that all our git repositories will be kept under /var/git/

1. On store:
cd /var/git/
mkdir mysite.git
cd mysite.git
git init --bare

(see update#1 below, if your git does not support --bare)

2. On devel (from the root of the website)
git remote add origin darren@store:/var/git/mysite.git
git push origin master
(the "origin master" part is just needed the first time)

3. On www,
cd /var/www/
git clone darren@store:/var/git/mysite.git my.example.com
(this creates /var/www/my.example.com and checks out the current version of the site)
(you need to configure your web server to not serve the .git directory, as well as any other files end users should not see)

4. If we make a change on devel and want to update www.
First on devel, do the commits, then:
git push
Then on www:
git pull

5. If we make a change on www, and want to update devel.
First on www, do the commits, then:
git push
Then on devel do:
git pull origin master

Not quite symmetrical is it? To fix this, so in future you just need to do "git pull" from the devel machine too, edit .git/config. You'll see these lines:

[remote "origin"]
url = darren@store:/var/git/mysite.git
fetch = +refs/heads/*:refs/remotes/origin/*

Don't touch them but, just above them, add this block:

[branch "master"]
remote = origin
merge = refs/heads/master

(unless you are doing something clever, you can use that verbatim). Now "git pull" works without having to specify "origin master" any more.

So, "git commit ...;git push" is like "svn commit ...". And "git pull" is like "svn checkout".

UPDATE #1: Older versions of git don't have the --bare function. Do these steps (on "store" machine) instead:

cd /var/git/
mkdir mysite.git
cd mysite.git
git init
mv .git/* .
rmdir .git

UPDATE #2: What to do if you already have some files on www, with some differences from what is in git? E.g. I had a test and live version of a website, with no source control; the test version had a number of extra files. I scp-ed the live version to devel, created a git repository. Then I set it up and push-ed it to "store". Then on my test site, I had to do these steps:

#Temporarily move current test files out of the way
mv my.example.com{,.old}
#Get the git version
git clone darren@store:/var/git/mysite.git my.example.com
#Merge in the test site
cd my.example.com
cp -a ../my.example.com.old/* .
#See what is different
git status
#For all changed files, revert to the git (live server) version
git reset --hard

By the end of all this, git status will just list (as untracked) the files that were on test but were not on live.

Tuesday, May 24, 2011

PHP, PDO, SQLite, mysterious lock problem

Let's start with the conclusion:
If doing a prepare() or query() with PDO and sqlite, and then you want to do something else with sqlite in the same PHP function then unset the first PDOStatement, before trying to do that something else.

As an example here is my code, to get a unique ID ($dbh is a PDO connection to an sqlite):

function get_next_id($dbh){
$q='SELECT next FROM MyNextId';
$obj=$dbh->query($q); //Throws on error
$d=$obj->fetch(PDO::FETCH_NUM);
$next_id=$d[0];

$q='UPDATE MyNextId SET next=next+1';
$row_count=$dbh->exec($q); //Throws on error
if($row_count==0)throw new Exception("Failed to execute ($q)\n");

return $next_id;
}

It works on Ubuntu 10.04, with sqlite 3.6.22, but fails on Centos 5.6, with sqlite 3.3.6, with this message:
exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 6 database table is locked'

I went through the whole changelog from 3.3.6 to 3.6.22, but got no clues (though I am now impressed with how active and organized the sqlite development is). But finally I tracked down this article on someone getting similar errors.

And that was it. I could have used $obj->closeCursor(), but deleting $obj is just as good:

function get_next_id($dbh){
$q='SELECT next FROM MyNextId';
$obj=$dbh->query($q); //Throws on error
$d=$obj->fetch(PDO::FETCH_NUM);
$next_id=$d[0];

unset($obj);

$q='UPDATE MyNextId SET next=next+1';
$row_count=$dbh->exec($q); //Throws on error
if($row_count==0)throw new Exception("Failed to execute ($q)\n");

return $next_id;
}

If you are doing just one PDO action per function then there is no need, because exiting the function will automatically do the unset.

(I don't know why this is a problem on sqlite 3.3.6 but not sqlite 3.6.22... in fact, I suspect it may be due a difference in the PDO or PHP version or configuration instead. Apologies for the loose end!)

Sunday, May 22, 2011

oauth for php (and Ubuntu)

There is a "standard" PHP OAuth library, documented in the manual, and which is installed via pecl. There is no package shortcut under Ubuntu; you still have to use pecl
sudo pecl install oauth

If you get a complaint about no "phpize", install the "php5-dev" ubuntu package. And then if you get an error in "php_pcre.h" when compiling, then you need to install the "libpcre3-dev" ubuntu package.

Finally, you need to enable it. Still as root:
cd /etc/php5/conf.d
echo "extension=oauth.so" >oauth.ini

(This creates a config file just for oauth; you could also simply put the extension line in php.ini.)

Finally, "php -m" should list "OAuth", and you can create a OAuth object in your php scripts.

Ubuntu package manager lists "liboauth-php"; the minimal information, and the lack of mention of pecl, should have given me the clue that it is something different.

Also different is this: http://code.google.com/p/oauth-php/

Monday, May 16, 2011

One man's feature request...

...is another man's bug.
Since Thunderbird 3, if you selected part of a mail then hit reply, only your selected part gets quoted. Incredibly annoying for those of us who think out loud with our mouse.

But it can be switched off. Edit|Preferences, Advanced tab, Config Editor. Then in the search box type:
mailnews.reply_quoting_selection
Change it from true to false (double-click that line is enough).

It turns out this was actually a Thunderbird feature request, dating from 2000! https://bugzilla.mozilla.org/show_bug.cgi?id=23394

Monday, February 7, 2011

ZendForm: inserting static text

I'm not a fan of Zend Form, or Zend Framework generally: too much structure, too much work to do anything slightly unusual. Trying to add static text to your form epitomizes this. Note, you have to work through the $form object, as in the view page this is all there is:

  <?=$form ?>

If you are using php 5.3, there is a quite reasonable solution:

  $form->addElement('text','whatever_id',array(
    'decorators'=>array(array('Callback',array('callback'=>
    function(){return "<h3>Whatever</h3>";}))),
    ));

Yes, that is as minimal as I can get it, and yes, you must give a 'whatever_id' that is unique on your form, but you have complete control over what is output, and nothing else gets output.

If you are using php 5.2, there are two approaches. The first way is equivalent to the above, just slightly more verbose:

  $form->addElement('text','whatever_id',array(
    'decorators'=>array(array('Callback',array('callback'=>
      create_function('','return "<h3>Whatever</h3>";')
      ))),
    ));

The second way is a different approach:

  $form->addElement('text','whatever_id',array(
     'label'=>E('<h3>Whatever</h3>'),
     'helper'=>'formNote',
     'decorators'=>array(array('Label',array('escape'=>false))),
      ));

Yes, four lines, and every line matters. Here is what gets output:
<label class="optional" for="whatever_id"></label><h3>Whatever</h3>

I cannot stop the <label> tag appearing; If you can, let me know. (It does not appear to the end user, just in the HTML source, so this is not a serious issue.)

(2011-03-02: Edited to show the create_function() approach in php 5.2)

Friday, January 28, 2011

Using Amazon EC2 for Shodan Go Bet

I've put up a short article on how Amazon EC2 was used for the Shodan Go Bet event held at the end of December 2010:
http://dcook.org/gobet/using_amazon_ec2.html

It is not about computer go, but instead will be of interest if you have wanted to see a concrete example of how EC2 is used, just how fast the fastest choice is, exactly how much it costs, how the Amazon EC2 Windows instances work (I used remote desktop running on linux!), etc.

Tuesday, January 18, 2011

Buffalo Air Station setup on existing home LAN

I found this surprisingly difficult; though it turns out the steps involved are quite easy...once you know how.

For this article I'll assume your existing LAN is 10.0.0.0/8 with 10.0.0.1 as the default gateway. I'm going to give the wireless LAN router 10.1.2.3. Substitute 10.0.0.1 and 10.1.2.3 below for something suitable for your LAN (and change the 255.0.0.0 network mask accordingly).

These instructions assume factory reset status, with the "router" switch (on the case) set to "On", rather than "Off" or "Auto". The AirStation is connected to the LAN hub using the Internet socket (the blue one).

1. Connect a computer to the Air Station directly (with ethernet cable, to one of the four LAN sockets), and set your computer to use DHCP, so that it gets an IP address that can see the Air Station! (Most likely your computer will receive 192.168.11.2.)
Note: The computer you connect with does not need to have a wireless card in it; it does have to have wired ethernet though.
Tip: Don't use your main computer for this step; that way you will be able to still use your main computer for: a) googling when you hit problems; b) ping tests (see step 5 below).

2. Connect by browser to 192.168.11.1
Use root as the username, with blank password.

3. Internet/LAN | Internet
IP Manual: 10.1.2.3
255.0.0.0
Extended:
Default GW: 10.0.0.1
DNS: (whatever DNS server you use: this is what will be handed out to wireless clients that connect using DHCP)

4. Wait for it to restart. (I also enabled pings at this point too.)

5. Then cycle the power; this appears to be essential.
Now you should be able to ping 10.1.2.3 from outside, and also ping from the connected computer to 10.0.0.1. And if you can do that then you can also connect to internet! You are done, time for a nice cuppa.

6. Test from a wireless device to make sure you are required to type in the key.

7. Set a root password.