Darren's Developer Diary: 2010

Tuesday, December 7, 2010

scp with multiple targets: ssh-add

I sometimes have the need to upload the same file to two or more locations on the target server. In fact I even have a script to help me, which goes something like this:

scp file1 file2 remote01:/path/to/somewhere/
 scp file1 file2 remote01:/path/to/another/

The problem is I need to type the password for remote01 twice. I had never managed to find some clever scp syntax to allow specifying two destinations, and a post on the TLUG mailing list confirmed that. But what I did learn from TLUG is that there is something called ssh-agent that can store passphrases for key pairs; this plugged a gap in my knowledge. There are three ways to login in via ssh/scp:

Give your password
Make a keypair with no passphrase
Make a keypair with a passphrase

The second way is used to allow scp from cron jobs, to automate copying files between two machines. I'd never really got the point of the third way: if you have to type a passphrase why not just use the first method, and save messing around making a key pair. (Well, yes, there is better security: a log-in then requires both something I have and something I know; you have to turn off PasswordAuthentication on your ssh server for this to have any meaning though. Thanks to Kalin for this comment.)

But it turns out there is this program called ssh-agent that remembers passphrases for you. And I found it is already running in the background on Ubuntu.

Enough chat, let's look at the solution. First, I created a one-off keypair for this script, on my machine, using something like this:

cd ~/.ssh
  ssh-keygen -t dsa -C me@example.com -f key_for_remote01
  chmod 600 key_for_remote01*
  scp -p key_for_remote01.pub remote01:~

When generating the keypair give a reasonably secure passphrase (you will have to type it in each time, and it is only of use to people in possession of key_for_remote1, so no need for a 20-random-character monster; I believe it is perfectly fine for it to be the same as your normal ssh login password for remote01).

Then log in to remote01 and append key_for_remote01.pub to ~/.ssh/authorized_keys. If that file does not exist then you can just rename key_for_remote01.pub to authorized_keys and move it into ~/.ssh/

(By the way, there is no need to put your private half of the keypair in ~/.ssh/ but that seems as good as place as anywhere else.)

Now, I modified my script as follows:

ssh-add -t 120 ~/.ssh/key_for_remote01
 scp -i ~/.ssh/key_for_remote01 file1 file2 remote01:/path/to/somewhere/
 scp -i ~/.ssh/key_for_remote01 file1 file2 remote01:/path/to/another/
 ssh-add -d ~/.ssh/key_for_remote01

What happens is the ssh-add line will ask you for your passphrase. The two scp lines then work automatically. Finally the ssh-add -d stops it caching your passphrase (forcing you to type your passphrase each time you run this script).

The -t 120 parameter says the passphrase will expire after two minutes. This is just in case the batch file doesn't complete and so does not get chance to run ssh-add -d.

Note: you can use that same key pair for other machines. Basically anywhere you put the *.pub half of the key pair will let you login. And you can login from anywhere you have the private half of the key pair.

Note: the timeout/deletion code above is deliberate for this application, but you don't have to do it this way. By allowing it to cache it permanently you would only be prompted for your passphrase once, and then all future ssh and scp logins would be automatic. They will be cleared when you log-out of gnome (on ubuntu, at least) or shutdown your machine.

Note: If you don't want to specify -i each time you use ssh/scp then you can add an entry to your ~/.ssh/config file, like this:

Host remote01
        Hostname 10.1.2.3
        Port 22
        IdentityFile ~/.ssh/key_for_remote01

Note: I used -C with my real email address. This is put in the public key, and I wanted the administrator of remote01 to know who put the key there. Without -C it defaulted to "myusername@myhost". The administrator of remote01 knows nothing about my machine names so this seemed unreasonable and I decided to use -C. But the advantage of that default is it seems ssh-agent knows about that name and will prompt automatically the first time you try to use ssh/scp, which means there is no need to run ssh-add first. I have not worked out yet if ssh-agent can be told to know about my email address too.

Sunday, November 28, 2010

Troubleshooting php and xinetd with strace

In a previous entry I mentioned I was trying to track down a difference in a php script between two machines. I've now got closer, close enough in fact to fix the problem but not to understand it. As a troubleshooter, the big discovery for me was how useful strace can be.

First, a PHP code snippet:

while(1){
stream_set_blocking(STDIN,true);
$s=trim(fgets(STDIN));
if($GLOBALS['log_fp'])fwrite($GLOBALS['log_fp'],$s."\n");
$params=explode(" ",$s);
$command=array_shift($params);
...
}

The first two lines say listen on stdin for a command, and wait forever for it to arrive. When it does we log it and then split it up into a command and parameters.

This script is used over xinetd. Xinetd listens on a port for us, and when it gets something it feeds it to stdin; php script output is then fed back over the socket to the client.

Here is the setup:

machine A: ubuntu 8.04, 32-bit, php 5.2
machine B: ubuntu 10.04, 64-bit, php 5.3

Both machines run an indentical version of xinetd: "xinetd Version 2.3.14 libwrap loadavg".

Machine A runs fine, machine B sometimes sits there. Analyzing this some more, by connecting over telnet to machine B:

I send a command
It replies straightaway
After 60 seconds or so it gives an error message
If I do nothing this error message appears again every 60 seconds.

On machine A, or if I run the php script directly on machine B, it behaves like this:

I send a command
It replies straightaway
It sits there forever waiting for input

Out of desperation I ran strace (using -p you can attach it to a running process; use ps -a | grep php to find the PID), and I was pleasantly surprised to see it was not too verbose.

Over xinetd on machine B (the problem configuration) the interesting snippet is:

...
write(1, "mydata\n", 7)                 = 7
fcntl(0, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(0, F_SETFL, O_RDWR)               = 0
poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, 60000) = 0 (Timeout)
write(3, "\n", 1)                       = 1
write(5, "\n", 1) 
...

On machine A over xinetd it instead looks like this:

...
write(1, "mydata\n", 7)                 = 7
fcntl64(0, F_GETFL)                     = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(0, F_SETFL, O_RDWR|O_LARGEFILE) = 0
read(0,

(Yes nothing comes after the "0," until I send some input.)

And direct access to the php script on machine B it practically the same:

...
write(1, "mydata\n", 7)                 = 7
fcntl(0, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl(0, F_SETFL, O_RDWR|O_LARGEFILE)   = 0
read(0,

So, the cause is clear: fgets() blocks for only 60 seconds instead of blocking forever, but only on this machine and only over xinetd. And the bug in my script then becomes clear: I'm not prepared to handle blank input!

Here is my fix:

while(1){
stream_set_blocking(STDIN,true);
$s=trim(fgets(STDIN));
if($s=='')continue;
if($GLOBALS['log_fp'])fwrite($GLOBALS['log_fp'],$s."\n");
$params=explode(" ",$s);
$command=array_shift($params);
...
}

As an ironic postscript, the real problem was elsewhere (a mismatch in another program version and the configuration being used for it!), and I also used strace to track that mistake down. But, what was so important about fixing the above was that it stopped distracting me with an error message every 60 seconds on machine B, when the configuration error was actually on machine A!

Saturday, November 27, 2010

php 5.3, ticks, pcntl_signal, pcntl_signal_dispatch

I'm trying to track down a problem with a script that works on php 5.2 but behaves strangely on php 5.3 (there are lots of differences between the environments, and I suspect php version will actually turn out to be completely unrelated). php 5.3 introduced pcntl_signal_dispatch() which processes outstanding signals and I've been investigating if that could somehow explain the behaviour differences I see.

The confusing part is that I've seen people saying that the old way of "declare(ticks=1)" is now deprecated in 5.3, and you must use pcntl_signal_dispatch(). This seemed very silly as you'd have to litter your code with calls to pcntl_signal_dispatch(), as well as have very different code for php 5.2 and 5.3.

If you're confused, like I was, here is what you need to know:
1. declare(ticks=1) still works: no deprecated message (with E_ALL|E_STRICT error reporting);

2. My script, using ticks, behaves identically under 5.2 and 5.3 when I send it a ctrl-C or a kill signal;

3. The docs don't mention it being deprecated; I realized everywhere saying this was user-contributed comments or blogs!

There is a performance aspect with using declare(ticks=1). I believe it is minor, but I think pcntl_signal_dispatch() has been introduced so you can use it instead of ticks if you want to take fine-control over when signals get considered.

Thursday, November 25, 2010

bash tips: meld on program output

Here is the challenge: I want to run diff on two large log files, but I'm only interested in the entries at a certain time in each log file.

This used to require four commands:

grep "ABC" a.txt >tmp1.txt
  grep "ABC" b.txt >tmp2.txt
  diff tmp1.txt tmp2.txt
  rm tmp1.txt tmp2.txt

(Imagine "ABC" is a datestamp, but it could be any other way to filter your log file.)

Thanks to the gurus on TLUG's mailing list I can now do this as a one-liner:

diff <(grep ABC a.txt) <(grep ABC b.txt)

It works perfectly for meld (a wonderful visual diff program) too! Here is another way to use it, to compare the output of a program on two different machines (here I'm comparing the php configuration):

diff <(php -i) <(ssh someserver 'php -i')

We're using the form of ssh that runs a program on the remote server. The command in the brackets can get quite complex. Here is an example where I needed to compare datestamps in two csv files, but the first field was an id number, arbitrary, and therefore different for all records.

diff <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}' dir1/abc.csv)
  <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}' dir2/abc.csv)

The -o flag to egrep tells it to only output the matching part, not the whole line. This next version shows the complete rest of the line starting with my datestamp field; i.e. this version just excludes the csv field(s) before the datestamp field:

diff <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}.+' dir1/abc.csv)
  <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}.+' dir2/abc.csv)

Monday, October 18, 2010

Bash loops

Faced with having to write 160 commands by hand (to split 5000 files into subdirectories, based on the first 3 digits of their 6 digit filenames) I quickly learnt bash loops instead:

for i in `seq 102 189`;
  do
    mkdir $i
    mv $i???.sgf $i/
  done

I.e. it does commands like this:

mkdir 102
  mv 102???.sgf 102/
  mkdir 103
  mv 103???.sgf 103/
  ...

UPDATE: In a comment, traxplayer kindly explained how I would make, for instance a filename like 120abc.txt; obviously writing $iabc.txt won't work. The trick is to write $i as ${i}. E.g.

mv ${i}abc.sgf $i/

I just needed to use it, and it worked! This bash script tries to find the last reference to each of XXX06..XXX20 in two logfiles.

rm lastorder.log
touch lastorder.log

for i in `seq 6 9`;
    do
        grep -h XXX0${i} order_verbose.log.old order_verbose.log | tail -1 >> lastorder.log
    done

for i in `seq 10 20`;
    do
        grep -h XXX${i} order_verbose.log.old order_verbose.log | tail -1 >> lastorder.log
    done

By the way, traxplayer also said he prefers to use $(...) instead of `...` for readability:

  for i in $(seq 102 189);

Monday, September 13, 2010

When bash history no longer works

I've been pulling my hair out: I have all the HISTORY variables set correctly, but history never survives to the next session. I thought it must be an Ubuntu 10.04 bug, as only that machine, but couldn't track down anyone reporting the same problem.

The breakthrough came when I stopped typing "history" to view the history, and decided to "cat ~/.bash_history". I got permission denied... and discovered the file was owned by root. Suddenly it all made sense.

(By the way, I've a bunch of blog articles about setting up a new Ubuntu dual boot, RAIDed, encrypted machine; coming soon to an RSS feed near you...)

Sunday, August 22, 2010

Putting parentheses into complex doctrine sql queries

Consider this query:

SELECT * FROM mytable WHERE
( status IN ('A','B') OR access IN ('X','Y') )
AND created_at > '2010-01-01';

Doctrine offers no way to do this! The following code makes a query without the parentheses, meaning it will return all records with status of A or B, whatever their creation date.

$q=Doctrine_Query::create() ->from('mytable');
$q->whereIn('status',$statusList);
$q->orWhereIn('access',$accessList);
$q->andWhere('created>?','2010-01-01');

If your IN lists are of fixed length you could do it like this (untested):

$q=Doctrine_Query::create() ->from('mytable');
$q->where('( status IN (?,?) OR access in (?,?) )',array('A','B','X','Y'));
$q->andWhere('created>?','2010-01-01');

But a far better solution is suggested on Scott Daniel's blog

I changed his code in two ways. First I put it straight into Doctrine/Query.php, which saves me having to change the class name in my source code. I'm comfortable doing this as I have the Doctrine source in SVN, meaning I won't lose my change when updating Doctrine. Secondly I used an explicit reference to save having to reassign the variable. I doubt there is any difference in speed, but this way looks clearer to me. So the full code I put at the end of Query.php looks like this:

/**
* Custom addition to Doctrine, to allow wrapping a set of OR clauses
* in parentheses, so that they can be combined with AND clauses.
*
* @see http://danielfamily.com/techblog/?p=37
* I modified it slightly to use an explicit reference.
*
* @return Doctrine_Query this object
*/
public function whereParenWrap()
{
$where = &$this->_dqlParts['where'];
if (count($where) > 0) {
    array_unshift($where, '(');
    array_push($where, ')');
    }
return $this;
}

To use it simply add a call to whereParenWrap() just after your OR clauses. For instance, here is how my original example is modified:

$q=Doctrine_Query::create() ->from('mytable');
$q->whereIn('status',$statusList);
$q->orWhereIn('access',$accessList);

$q->whereParenWrap();
$q->andWhere('created>?','2010-01-01');

In my actual code this required me to shuffle things around so the OR clauses were defined first, then the straightforward AND clauses came last. But that was no hardship.

However there is one flaw in this approach: if you need to have two such sets of OR clauses, which are themselves joined by an OR or AND. I thought a more generic approach might work: openParenthesis() and closeParenthesis() functions which you call exactly where you want them. But openParenthesis goes wrong (it ends up making "WHERE ((AND status IN..."; i.e. the "(" looks like an existing clause to Doctrine). I'm sure this could be made to work, but it will be more intrusive. So I'm going to be pragmatic and worry about this should I ever actually need it.

Tuesday, August 3, 2010

Doctrine many-to-many relationships with onDelete

In a previous article I explained why I now try to put an explicit onDelete on every relation. The problem comes with many-to-many relationships. If you put onDelete: CASCADE in the relationship it gets ignored and the entries survive in your refclass table. I found some page on this but none that really answers the question fully:

http://stackoverflow.com/questions/1488171/in-symfony-doctrines-schema-yml-where-should-i-put-ondelete-cascade-for-a-many

http://groups.google.com/group/doctrine-user/browse_thread/thread/2f924ee3b8cd2689?pli=1

http://trac.doctrine-project.org/ticket/1123

The common theme is if you want an onDelete on a many-to-many relationship then you have to put it on the reference table. But not a single example anywhere. So, let's start with the "friends" nest example in the manual.

Here is what does not work (no error, simply leaves the records behind in FriendReference)

User:
# ...
relations:
    # ...
    Friends:
      class: User
      local: user1
      foreign: user2
      refClass: FriendReference
      equal: true
      onDelete: CASCADE

FriendReference:
columns:
    user1:
      type: integer
      primary: true
    user2:
      type: integer
      primary: true

This one won't even compile (well, I didn't think it would):

User:
# ...
relations:
    # ...
    Friends:
      class: User
      local: user1
      foreign: user2
      refClass: FriendReference
      equal: true

FriendReference:
columns:
    user1:
      type: integer
      primary: true
    user2:
      type: integer
      primary: true

relations:

onDelete: CASCADE

So, how do we change it so it will work? I'll update this if and when I work it out; if you know please let me know.

Be explicit about OnDelete in doctrine schemas

In doctrine, I tried $obj->delete(); and got an SQL exception from another table. But you're supposed to have deleted that too, that's the whole point of me telling you about all the relations between the tables, screamed I.

So I went through my schema inserting "onDelete: CASCADE" in every relation. But as I did it realized sometimes I wanted "onDelete: SET NULL", and sometimes I wanted "onDelete: RESTRICT". This is a helpful article on the onDelete choices (though for null you actually have to write "SET NULL" not just "NULL". There is also another choice: "OnDelete: SET DEFAULT" (e.g. when a user is deleted, you may want his comments to become owned by the admin user, rather than become orphaned.)

So, now my advice is to explicitly put onDelete on every relation; treat it as required. I also find I need a line of documentation to explain every time I choose something other than cascade. It is good to be forced to think of these things.

All is smooth until you come to many-to-many relationships, especially self-referential (nest) relationships, such as two users being friends (symmetrical) or one user blocking another (one-way). I will cover it in a separate article.

Sunday, July 25, 2010

Wine broke, or PHP, or xinetd??

Sometimes it can be really educational to sit on the shoulder of brilliant developers and watch how they troubleshoot and debug. So, here is your chance to sit on my shoulder,... and snigger as failure reduces me to tears.
[2010-11-18 UPDATE: a mere 5 months after originally posting, I think I have the answer; see the bottom of this article.]

A program worked fine Saturday. This is on my Ubuntu 8.04 (hardy heron) machine.

Sunday I updated firefox and ghostscript only. I then rebooted, which means kernel 2.6.24-28 finally became active (I updated it July 11th, but hadn't had chance to reboot until yesterday).

Today that program doesn't run. It is one particular wine program: a go program. Another wine program (also a go program) runs fine. And here is the real killer: if I start that problem wine program from gogui (gogui.sf.net) it works. Exactly the same commandline, but it works from gogui, and doesn't work when started from my php script. That PHP script was last changed last Wednesday.

I rebooted into the previous kernel (2.6.24-27), and the problem is exactly the same.

Have you ever done those logic problems where you get a list of clues and have to work out who did what? Applied to the above we discover the only logical explanation is that... reality is warped. It can't be: wine, the go program, php, my php script or the kernel.

Here is some more evidence. The first time I run it after a reboot the program (called valhallgtp.exe) fails to start with a stack dump and backtrace. Here are the first few and last few lines of that:
=>1 0x7bc3b23c __regs_RtlRaiseException+0x4c() in ntdll (0x014ded68)
2 0x7bc76de3 in ntdll (+0x66de3) (0x014df0cc)
3 0x7bc3a936 RtlRaiseException+0x6() in ntdll (0x014df144)
4 0x00415833 in valhallgtp (+0x15833) (0x014df2ac)
5 0x004159dd in valhallgtp (+0x159dd) (0x014dfa24)
... (6..20 are all in valhallgtp)
21 0x0049b02c in valhallgtp (+0x9b02c) (0x014dff08)
22 0x7b8773a7 in kernel32 (+0x573a7) (0x014dffe8)

I also see: "err:seh:raise_exception Exception frame is not in stack limits => unable to dispatch exception."

The 2nd and subsequent times I try to start it I get:
Failed to start wine ValhallGTP.exe ...
(this is an error from my php script; i.e. proc_open() is failing.)

The CWD (current working directory) is correct, the parameters are correct; both are exactly what gogui is using. And running it from a bash shell is fine too.

The fingers are pointing at PHP. But no PHP upgrades either yesterday or in the July 11th batch. And that same PHP is still successfully starting 4 other programs, including another one that uses wine. (BTW, go to synaptic package manager, file menu, history; this is where you can see exactly what Ubuntu updated and when.)

Disk space is fine ("df"), memory is fine ("cat /proc/meminfo"). Machine load is low ("w"). Running it as root has same problem.

When I ran from the -28 kernel I also got errors about no X DISPLAY. And then when I started the other wine program it opened but had no display. And running wine config was the same. I've not seen that again. Let's pretend it never happened and concentrate on what we can reproduce.
(UPDATE: it happened again (this time with the -27 kernel). I had the gtpmfgo.exe running fine under wine; then starting mfgo.exe it came up but with no display. Close and retry showed it was repeatable. After closing gtpmfgo.exe it worked, and then I could open gtpmfgo.exe fine. Note: not repeatable; I haven't managed to get it to happen again.)

Okay, where to next? What I don't understand is how come I cannot reproduce the problem I get when I start it the first time. That seems like a small enough challenge to be achievable: how to get it to crash consistently.

...time passes...

Aha! It seems it is crashing in the same way; it was just crashing and dying before I got chance to read the crash message. So the first time after a reboot it must take longer than a second to start up (perhaps wine initialization).

...aha, aha! Narrowed down further. To explain this I need to give more background: I have a script called frontend.php that calls backend.php over sockets. backend.php uses stdin/stdout and uses xinetd to turn it into a socket server. When I run backend.php directly it works!

Could the updates/reboot have affected xinetd somehow? ...probably not. The xinetd files are all dated Dec 2007 except my own config file which is June 30th. And, the other 4 programs are started in exactly the same way and work fine.

Starting backend.php in exactly the same way as xinetd does works fine. (I've tried it both from bash and from sh.)

...time passes...

I seem to have a solution. (I had started working on a xinetd replacement php script, but that doesn't seem needed now.) My solution was to set xinetd to listen on another port, and use that port for just the problem program. This is working. I cannot explain it however: I had already tried connecting to only that program, so xinetd wouldn't have been doing anything different to now.

UPDATE: 7 weeks later and I do another reboot and it broke in exactly the same way!!
And again the only solution I could find was to give xinetd a new port for just this particular problem.
Weird, very, very weird, ...

2010-11-18 UPDATE: It was bothering me again, but on a different machine, and the trick with setting up xinetd on a different port didn't work. But adding this code to the top of backend.php seems to have solved it:
if(trim(getenv('DISPLAY'))=='')putenv('DISPLAY=:0.0');

The problem program has a GUI you see. I switch the GUI off with a commandline option ("hidewindows") but I think it actually creates windows and just hides them. And yes, none of the other programs uses a GUI... sigh, the clues were there.

Curiously the "hidewindows" option no longer works! That is probably going to annoy me, but not as annoying as not working at all!

Thursday, June 24, 2010

Zend Form: display group and custom decorator

I keep meaning to write a proper review of Zend Framework, but I am waiting to finish a big project that uses it. It is running late, and I think ZF can take some of the blame. So don't hold your breath waiting for a positive review.

Today's topic: I want to have part of my form hidden initially, and instead show a button saying: "toggle more questions".

So, we make it a display group (where ex1 and ex2 are the form elements to hide):

$form->addDisplayGroup(array('ex1','ex2'),'extras');

By default this wraps it in a fieldset, with no place I could see to slip in my CSS, javascript and the toggle link. What I need is a decorator, I said to myself.

Oh, gasp, does Zend make this difficult or what! In another part of this project custom decorators had been used for a minor layout change. 7 classes in 3 directories, almost all of it boilerplate. The real work was being done in CSS; those 7 classes were just to give a way to name the items as far as I could tell.

The problem is the Zend Framework Philosophy of making things complex. Did you realize there is no addDecorator($myclass) function! You have to keep decorators in a special directory and then tell Zend where to get them. Then addDecorator('part_of_my_class_name').

ZF's saving grace is that it is open source. So I poked around, and here is my solution. First, I'll define an alias for readability (optional):

$displayGroup=$form->getDisplayGroup('extras');

Remove all the default stuff I don't need (this step is optional too):

$displayGroup->removeDecorator('Fieldset');
  $displayGroup->removeDecorator('HtmlTag');
  $displayGroup->removeDecorator('DtDdWrapper');

Now insert my decorator (these 4 lines are in lieu of addDecorator($myclass)):

$decorators=$displayGroup->getDecorators();
  $decorators['MyTest']=new MyTestDecorator();
  $displayGroup->clearDecorators();
  $displayGroup->addDecorators($decorators);

(I.e. get the current decorators, add mine, then replace the existing decorators with the new set.)

Finally we get to the meat. All you need is to define a render() function that takes a string (the existing content) and returns that string, optionally modified. Here is the minimal version that does nothing.

class MyTestDecorator extends Zend_Form_Decorator_Abstract
{
  public function render($content){ return $content; }
}

And here is the full version: it hides the elements in the group and uses JQuery to show/hide it. The CSS is inline.

class MyTestDecorator extends Zend_Form_Decorator_Abstract
{
  public function render($content)
  {
    $js="$('#extra_questions').toggle();return false;";
    return '<a href="" onclick="'.$js.'">Toggle Tags Visibility</a>'.
      '<div id="extra_questions" style="display: none;">'.$content.'</div>';
  }
}

Yep, it's that simple. Oooh, the architects of ZF must be turning red with rage ;-)

UPDATE: The best article I've found so far on Zend Form Decorators. As many of the comments say, the length of the article is also a very good argument against using Zend Form. And it still didn't answer my questions, so it really needed to be 2-3 times as long. But if you need to format a form, it is a far more useful resource than the utterly inadequate Zend Form manual.

Tuesday, June 22, 2010

fclib 0.4.20 release

I just put up another fclib release. Fclib is an ad hoc collection of php libraries, started about 10 years ago; the i18n-related functions are perhaps of most interest to people (with functions for Japanese, Chinese and Arabic language processnig).

This new release has some minor bug fixes, a new utf8 function (for truncating a string), and a new file, modify_images.inc, that is a high-level interface to use gd functions to make doing thumbnails, cropping, resizing and basic drawing edits. It can operate on one or a batch of images.

(The previous release was 11 months ago, described here)

Monday, June 21, 2010

C++ socket library: asio

It is when it comes to writing a simple socket client in C++ that I really feel how much PHP has spoilt me.

I previously have used SmartNetwork, part of the SmartWin library ( http://smartwin.sourceforge.net/ ) but it is fatally flawed: when the remote server dies there is no error reporting, so it happily carries on sending data to oblivion.

Therefore when an application I am currently working on needed to write to a socket I decided to try out a new library. I looked at 4 or 5 libraries, and after that first pass rejected all of them. Mainly based on their lack of clear documentation; sometimes based on their license. Faced with a choice of zero, I lowed my criteria, and chose to try boost::asio (also available in a non-boost version). Boost is a major project, already installed on my machines, portable for at least Linux/Windows, and the libraries are peer-reviewed by some very clever people. Boost libraries are also usually flexible (to the point of making them hard to use), and definitely able to give me the error reporting I need.

Here is the one line review: I've been banging my head against ASIO for 2-3 weeks, still do not have working code, but every time I consider running away I decide to stick with it.

Or in other words, it is terrible, but the alternatives are worse.

It is terrible in three ways: no high-level functions, it is hard to use properly and it is basically undocumented. Yeah, yeah, there is API documentation for all the functions, but it doesn't say when to use which function. And, yeah, yeah, there are a dozen or so tutorials. But they are all toy examples, and don't explain why the code has been written the seemingly-complex way it has.

Let me explain a bit more. Asio offers sync operations, but they are no use for any real-world code. For something you intend to use in production you will be forced to use the async functions. That means you'll need to use boost::threads, boost::bind (for the callbacks), boost::smart_pointer (as you have to wait for all callbacks to finish before you can delete an object, and it turns out your async callbacks can get called even after you've closed the socket) and understand how async programs work. All of that is hard.

I keep thinking I've got the code working, then I discover a timing problem that causes a crash only once in 15 runs, or it works on Linux but crashes on Windows, or is fine connecting to localhost but crashes when connecting to a remote server (due to different timing).

Going back to the first of my reasons for describing it as terrible, an example of the lack of high-level assistance is that there is no timeout parameter on any async operations. To do an async_connect that I want to give up after 3 seconds I have to write code for both the connect and the timer, which means two callbacks, and coordinating those two callbacks. Time-outs are part of the low-level BSD sockets but the asio code is doing something that deliberately cripples that, so trying to use them won't work either.

But I don't like to moan without being constructive. So I've been working on two things. First a high-level class called SocketFeeder (and SocketFeederSet) that you can just call write() on and not have to worry about all these callbacks. Second, a tutorial explaining how SocketFeeder has been written, and the thinking behind the design decisions.

The class is for a client's production environment but has been written on my own time, so I'll be able to release it as open source. I'll edit this blog to link to it when it is finally ready; if you are keen to see it and the tutorial then leave a comment or send me a mail. Constructive nagging usually works! (And if you want to sponsor me, or are a magazine that pays for articles, let me know; it will get released eventually, but financial incentive means I can give it priority.)

Tuesday, June 15, 2010

doctrine: delete object but leave it in database

I've a website where I make a page from a doctrine object which is taken from a database. Nothing special there. For adding the data to the database I have a form that goes to a preview page then a confirm button that actually writes it. For the preview page I share the same display logic, so I construct a doctrine object and simply don't save it. For editing, I do the same: I load the doctrine object from the database, make the changes specified on the form, and then either show the preview page or save to the database (depending on if preview or confirm was clicked).

The problem came with editing a one-to-many relation. As a concrete example let's say the user is registering multiple email addresses. I was doing this:
   $User->Emails->delete();

Then doing this for each address the user gave:
   $User->Emails[]=new Email(...);

It nicely handles when the user has removed an email address, changed one, or added an extra one. But when I realized the flaw I slapped my forehead so hard I gave myself whiplash. Do you see it? Have a moment...

Yep, if they click cancel (or leave before clicking confirm) they're expecting nothing has changed, whereas in fact all the email addresses they'd previously input have vanished.

The problem is that delete() happens immediately, rather than waiting for me to call save(). I hunted high and low for a way to stop that. My final solution was:
   if(confirmed)$User->Emails->delete();
   else $User->unlink('Emails');

I.e. unlink() appears to be the delete-that-does-not-touch-database I was searching for. (Of course, don't do something silly like go and call save() now, or the user will lose their email addresses and you'll have some orphaned records in your Emails table.)

Incidentally this did not work:
$user->clearRelated('Email');

I'd hoped it would, after reading the description in the manual. But in fact it did nothing at all.

Friday, June 11, 2010

jquery: dcookorg_annotator V0.3 released

I know, I'm behind in my jquery plugin announcements, so I'm going to do three in one.

First up is annotator, for annotating an image (or anything):
http://dcook.org/software/jquery/annotator/

You can have any number of annotations, can drag them anywhere, and can resize them. In the default mode each annotation has a text box appear underneath it for adding a comment.

New in version 0.3, and shown in the screenshot to the left, are some hooks for attaching a form to each annotation, so you can create a custom form for each one (or any other idea you have)!

MIT-license open-source, and tested in all of IE6, IE7, IE8, Safari, Firefox 3 and Firefox 3.5.

Next up is selector_aspect, which is for selecting part of an image while maintaining a fixed aspect ratio. E.g. useful for cropping an image.
http://dcook.org/software/jquery/selector_aspect/

A simple straightforward plug-in, with not many options.

Third is get_percentage_position, which is used to get the size of one div (or any DOM position) in terms of another div (or any DOM object), and also to get the relative position in the same terms. Not very glamorous, but useful in conjunction with the selector_aspect plugin, for instance. It is available here:
http://dcook.org/software/jquery/get_percentage_position/

Finally, a reminder that my first jquery plugin, to run a magnifier over an image is introduced here:
http://darrendev.blogspot.com/2010/04/jquery-plugin-image-magnifier.html

and available here:
http://dcook.org/software/jquery/magnifier/

and all my jquery plugins are being kept here:
http://dcook.org/software/jquery/

Sunday, June 6, 2010

C++: incomplete type and cannot be defined

A very confusing error just now:
error: aggregate ‘std::ofstream out’ has incomplete type and cannot be defined

from simple code:
std::ofstream out;

And the similar one:
error: variable ‘std::ofstream out’ has initialiser but incomplete type

from this code:
std::ofstream out(fname.c_str(),std::ios_base::app);

Using this didn't help:
#include <iostream>

Using this also didn't help:
#include <ofstream>

That was confusing me most, but now I see I've been getting "error: ofstream: No such file or directory" and I was missing it in the noise of other warnings and errors.

The solution was simple:
#include <fstream>

Yes, a very simple solution, but google wasn't helping. If it had been saying "ofstream is not a member of std" I'd have known I was missing a header file; a strange error message has you looking in other places. (I guess another std header file is doing a forward declaration for ofstream, which is why we get "incomplete type".)

Mumble, grumble, back to work.

Monday, May 31, 2010

Rotating in jquery (firefox problems)

In my entry on rotating images in jquery/javascript I mentioned I'd not had the problem that the unofficial patch was supposed to fix. Well, now I've seen it even using that patch. The problem is this: if you try to rotate an image (in Firefox, at least) that hasn't fully loaded it all goes wrong.

When rotating in Firefox/Safari you are making a copy of the image; the jquery-rotate patch is to make sure your copy has been properly initialized before doing anything with it. My problem was similar: I was trying to rotate an image that hadn't loaded. (Curiously it didn't happen on a 106Kb image, but did consistently on a 69Kb image; the smaller image was in landscape, while the larger image was portrait, but I have no idea if that is related.)

My rotate call is in a function called init(). I first tried calling init() from the image's onLoad(), which worked most of the time. I also tried a more complex solution that involved not calling init() until both the image's onLoad() and JQuery's $(function(){...}); (what I personally like to call onDomLoaded) had both run. But still it happened on certain images, and I now feel that that level of complexity is not needed (also, onDomLoaded always seems to run before the image's onLoad triggers).

So, my solution is this:

  <img src="test.jpg" id="img"
  onLoad="window.setTimeout('init()',40)">

Using a time-out of 25ms was not enough, but 40ms seems reliable. The downside is that the image flashes on screen briefly in the original orientation before rotate can kick in. We can fix that by having it invisible initially:

  <img src="test.jpg" id="img" style="visibility:hidden"
  onLoad="window.setTimeout('init()',40)">

  <script>
  function init(){
    $('#img').rotate(90);
    $('#img').css('visibility','visible');
    }
  </script>

Monday, May 24, 2010

Rotating in jquery (and IE8 problems)

Rotating an image in jquery is harder than you might think. Harder than I had imagined it would be, at least. The first problem is that it is not part of either JQuery or JQueryUI, so you are out there in the wilderness where the 3rd party plugins live. And, trust me, it can get wild out there.

I first tried jqueryrotate. It is quite heavy, at around 10KB, but the main reason I abandoned it is that it didn't work properly for me (sorry, I cannot even remember why now).

Next I tried jquery-rotate and I soon got this working in Firefox 3. My code also worked in Safari and Firefox 3.6 first time. IE6/7/8 were the problem. Under the hood all these plugins use DXImageTransform.Microsoft.Matrix for the IE browsers (which works back to IE5 apparently), and Canvas for all other browsers (which works from at least Firefox 3). They even allow rotation of any angle, though I only needed to rotate in steps of 90 degrees.

(By the way jquery-rotate seems to be unmaintained, so I'm using this unofficial patch even though my testing didn't see the problems it fixes.)

Here is my code:

  $('#img').rotate(rotation,true);

Then:

  if(rotation==90 || rotation==270){
    $('#img').width(300);
    $('#img').height(225);
    }
  else{
    $('#img').width(300);
    $('#img').height(400);
    }

(To keep that code sample clearer I've hard-coded the sizes, to assume an image that has a 3:4 aspect ratio in its original position). The point of this code is to scale the image down to fit in the layout. This was where the first cross-browser problem appears. When Firefox/Safari rotate it they are creating a new object of the new dimensions. As far as I can tell when IE rotate it they are creating an optical illusion. It appears to have been rotated but when you read its width/height you get the numbers for the original position; and the same when you try to set them. (As an aside I tried a number of ideas to get around this underlying issue, but all ended in failure, so I guess it is just the way IE works.)

Tinaysh... Tenacish... Tenaciousness is my middle name, even if it is hard to spell. This code does the job:

  if(rotation==90 || rotation==270){
    if(document.all && !window.opera){  //IE-specific
        $('#img').width(225);
        $('#img').height(300);
        }
    else{
        $('#img').width(300);
        $('#img').height(225);
        }
    }
  else{
    $('#img').width(300);
    $('#img').height(400);
    }

OK, images rotate nicely in all browsers, on to the Next Problem.

I attach a draggable and resizable div (see the demos at http://dcook.org/software/jquery/magnifier/ to get an idea) to the image. I don't want it to leave the image so I use this:
mydiv.draggable({containment:img});

When rotation is 0 or 180 everything is fine, but rotations of 90 and 270 go wrong in IE6/IE7/IE8; it seems the different coordinate system confuse it and I can move mydiv outside the image. (Incidentally jquery's resizable containment is broken, so I had to hand-code that; my resizable containment code is not affected by this problem!)

But things get worse. In IE8 *only*, the rotated image overlaps the following page items; IE6/IE7 correctly push the page down when a landscape image gets rotated to become a portrait image. I could have lived with containment not working, but this one is a showstopper.

We can put the image in a named div, then alter the width/height of that div each time we rotate. So the code ends up as this:

  if(rotation==90 || rotation==270){
    if(document.all && !window.opera){  //IE-specific
        $('#img').width(225);
        $('#img').height(300);
        $('#img_outer').width(300);
        $('#img_outer').height(225);
        }
    else{
        $('#img').width(300);
        $('#img').height(225);
        }
    }
  else{
    $('#img').width(300);
    $('#img').height(400);
    if(document.all && !window.opera){  //IE-specific
        $('#img_outer').width(300);
        $('#img_outer').height(400);
        }
    }

You can also fix the other problem by using #img_outer as the containment div, instead of #img. That fixes all problems in IE6/IE7 (though be careful with margins, padding and borders; i.e. make sure #img_outer is exactly the same size as #img).

IE8 is still less than ideal. Using #img_outer stops it overlapping with the following content, but it puts hard white space in the area where the image would be if height/width were reversed (and the draggable div can still be dragged into that area). For a landscape image that has been rotated that means whitespace to the right of the image, messing up multi-column table layouts. For a portrait image that has been rotated that means a lump of whitespace between the bottom of the image and the start of the next page content. Using overlap:hidden did not help.

This is very novel: a bug in IE8 that isn't in any other browser, not even IE6! I've not solved this, and welcome advice. Yes, okay, I could use absolute positioning to put a "Download Firefox" icon in that white area, but that isn't quite the advice I'm looking for...

Thursday, April 29, 2010

Linux keyboard shortcuts

Have you ever gone to hit ctrl-tab (to switch tabs) or ctrl-w (to close current tab) in Firefox and suddenly all your Firefox windows (even the one playing radio on another desktop) disappear? (I'm using Ubuntu and Gnome, but I get the impression this problem affects all Linux distros and all window managers.)

It must be old age making my fingers clumsier but I didn't even know ctrl-q did that until a few weeks ago. Yet I've done it a couple of times by accident recently, and when I did it today I decided Something Has To Be Done.

The solution is not just joyously simple, but I also learnt another cool function while I was there. First the solution: System|Preferences|Keyboard Shortcuts. Assign ctrl-q to do something; then Firefox never gets to see it. I assigned it to the calculator app which is nicely harmless.

I learnt this trick here, which also mentions that the same idea works in XCFE (go to keyboard panel). I'm betting KDE has something similar.

And the cool function? Looking through the other keyboard shortcuts I saw Alt+Print takes a screenshot of just the current window! You have no idea how many times I've clicked Print, then opened up Gimp to crop the screenshot to show just the window of interest. I'm now alternating between feeling very foolish and feeling very empowered.

Thursday, April 22, 2010

jquery, click here to crash IE8

IE6/7/8 had me pulling me hair out again, but I've just found the problem!

Here is the stripped-down code that shows the problem. We have a few divs, one that is active (and can be resized and dragged). The others are inactive but can be clicked to turn them into the active item.

var currentItem=null;

function makeActive(item){
if(currentItem){
    currentitem.css({borderWidth:1,zIndex:998})
        .draggable('disable')
        .resizable('disable')
        .click(function(){makeActive($(this));});
        ;
    }
item.css({borderWidth:3,zIndex:999})
    .draggable('enable')
    .resizable('enable')
    ;
currentItem=item;
}

Clicking back and forth between two items was fine in firefox, but IE8 would lock up (IE6/IE7 were the same). I kept stripping it down until it clicked (pun intended, sorry!). Yes, on each loop the click handler is added again. Each click handler is calling this function recursively and I think that is what locking up IE8.

The solution is simply to change the .click line to look like this:
.one('click',function(){makeActive($(this));});

That doesn't just make IE8 happy, it is also more clearly describes the click handler we want. In fact now I understand the problem I'm surprised firefox was not crashing too.

P.S. resizable('disable') does not work in JQuery 1.7; it is apparently fixed in JQuery 1.8 though.

Monday, April 19, 2010

jquery plugin: image magnifier

I've just released my first fully-fledged and useful jquery plugin:

http://dcook.org/software/jquery/magnifier/

It allows you to magnify an image and examine just one part of it.
Handles on the edge of the "magnifying glass" allow resizing it, which alters the degree of magnification.
The plugin is fully documented, with numerous usage examples.
It runs on all major browsers and operating systems.
Naturally it is open source (MIT).

Thursday, April 15, 2010

find, grep and tr

A practical example of three useful unix commands, and how they can work together.

I thought I had found another chance to use sed (see Right Sed Fred, I'm too sexy to search and replace), but in the end did not use it; I wanted to remove newline characters but sed works line by line so this is not possible. (There is a way to do this with sed apparently, but it looked awfully hard to understand.) So I used tr instead.

Here is my final command:
find /path/to/myfiles/ -mtime -125 -name '*txt' -print | grep -v xxx | tr '\012' ' '

The first part:
find /path/to/myfiles/ -mtime -125 -name '*txt' -print

means search all the *.txt files under the given path, that have been created or modified in the past 125 days. It outputs one filename per line.

However it include some files I didn't want. Luckily they all had the same filename component ('xxx' in my example above), so were trivial to identify. I used "grep -v" which means exclude anything that matches.

At this stage I had the list of filenames, but one per line. I wanted to use them as the parameters to a batch file, so needed them all on one line. I couldn't get sed to work, but stumbled on a way to use the "tr" tool. It takes two parameters: the character to replace, and what to replace it by. '\012' is octal for the unix linefeed character. So
tr '\012' ' '

means replace LF with a space. I slapped this on to the previous command and got just want I wanted.

(For unix beginners, the | is called the pipe character, and it means take the output of the command before it and give it as the input to the command after it. Many unix commands are designed with this kind of piping behaviour in mind.)

Thursday, April 8, 2010

Jquery cheat sheets

A cheat sheet, printed out, can be invaluable. Here are a few I've looked at:

My choice:
http://www.javascripttoolbox.com/jquery/cheatsheet/

Nice and compact, going into good details. It comes in colour, but the wonderful thing is it also comes in Excel format, so I could edit the colours.
It doesn't show the new in 1.3 functions, and doesn't point out the deprecated ones ($.browser and $.boxmodel); I annotated that myself using the "Too Colourful" one below.

Too new:
http://labs.impulsestudios.ca/jquery-cheat-sheet

This one is nice, fits on one page, already monochrome (though using a light grey for optional parameters, which is hard to read). Does not show the parameters like the my first choice above. It is for jquery 1.4, showing what is new for 1.4; but I also wanted to know what to avoid if I wanted to write a plugin that would work back to 1.2.

Too Colourful:
http://www.artzstudio.com/files/jquery-rules/jquery_1.3_cheatsheet_v1.pdf
2 pages worth, shows what is new in jquery 1.3, and what is deprecated. Uses colour and looked terrible as a b/w print-out. Also, it lists each version of similar functions (such as event handlers), which is why it needs two pages instead of one.

Tuesday, March 23, 2010

The Dangers Of GroupBy

I had a complex join (three tables) with a group by and a where cause with three conditions. It seemed to be working, until I added more data; I eventually narrowed problem down to just a single table. Consider this query:


SELECT w.user_id, w.some_date, w.id
FROM mytable w
GROUP BY w.user_id

id is the primary key. That query returns all the data:


user_id   some_date   id
   1      2010-02-01   1
   5      2010-02-02   2
   1      NULL         3
   1      2010-03-22   4
   1      2010-03-24   5

I'm only interested in the record with the latest date for each user, so I add a MAX on some_date and GROUP BY user_id:


SELECT w.user_id, MAX(w.some_date), w.id
FROM mytable w
GROUP BY w.user_id

That gives:


user_id   MAX( w.some_date )   id
   1       2010-03-24           1
   5       2010-02-02           2

Hang on! The dates are right, but 2010-03-24 is from id=5, not id=1. The some_date column has been max-ed in isolation.

This issue is explained in an excellent two-part article: part1 part2

In fact it is so excellent I didn't fully understand it, and am going to re-read it three times a day until I do. I really need to understand it because, as the author shows, it is very easy to get correct answers from a query on your test data, pass all your unit tests, and deliver something that goes wrong when one more record is added.

My query is simpler than the one in his example, but after many tries I cannot find a solution. The "join(...)as d" syntax does not seem to work in MySQL or I am not using it correctly. It seems what I'm trying to do is very basic, so I do not understand why I cannot find more advice on the subject. I'm open to suggestions!

(For the moment I'm going to give up, do a simpler query and have PHP process the results to get the data I actually want.)

Sunday, March 7, 2010

Image Protection in Zend Framework

Sometimes you want to only show certain images (or other media) to certain users (e.g. those that are logged in, or those that have paid for it). This quick tutorial will show how to write the controller for Zend Framework for serving images. This is for ZF 1.10, but as far as I can tell it is not using anything new or unusual.

This tutorial does not cover the validation part of the code.

We start with a controller that does not use views, layout and all that stuff:

class ImgController extends Zend_Controller_Action
{
public function init()
{
$this->_helper->viewRenderer->setNoRender(true);
$this->_helper->layout()->disableLayout();
}
}

Now add an action that will serve all images:

public function imgAction(){
$type=$this->getRequest()->getParam(1);
$fname=$this->getRequest()->getParam(2);
$ext=$this->getRequest()->getParam(3);

echo "type=$type, fname=$fname, ext=$ext";
}

public function badAction(){
$this->getResponse()->setHeader('Content-Type','image/jpeg');
}

The idea is that URLs such as http://127.0.0.1/img/folder/abc.jpg will end up at the imgAction() function, and $type will get set to "folder", $fname to "abc" and $ext to "jpg". To do that we need to set up what is called a router, which we will do next. The badAction() will handle any problems; this crude code will send back a 0 byte jpeg.

To create a router, jump to your bootstrap file and create a function something like this:

public function _initCustomRouting(){
$frontController=Zend_Controller_Front::getInstance();
$router=$frontController->getRouter();
$router->addRoute(  //To catch any that are not formatted correctly
'imgHandlingBad',
new Zend_Controller_Router_Route('img/*',
array('controller'=>'img', 'action'=>'bad')
)
);
$router->addRoute(  //Formatted as /url/type/fname.ext
'imgHandling',
new Zend_Controller_Router_Route_Regex('img/(.+)/(.+)\.(.+)',
array('controller'=>'img', 'action'=>'img')
)
);
}

Note that order: imgHandlingBad must come before imgHandling. I found this introduction (7 minute screencast) to using Zend_Controller_Router_Route very useful; then some trial and error and study of the ZF source code.

Now you should be able to test http://127.0.0.1/img/folder/abc.jpg and see our debug comment. But without a folder (e.g. http://127.0.0.1/img/abc.jpg) it will get handled by the badAction(), as will those without an extension: http://127.0.0.1/img/folder/abcjpg

Now the final step is to feed back some images, so in imgAction() replace the echo line with this code:

$path=$basePath.$type;
$fullPath=$path.$fname.'.'.$ext;
if(!file_exists($fullPath))return $this->badAction();

switch($ext){
case 'jpg':$mime='image/jpeg';break;
case 'png':$mime='image/png';break;
case 'gif':$mime='image/gif';break;
default:return $this->badAction();
}
$this->getResponse()->setHeader('Content-Type',$mime);

readfile($fullPath);

It decides the filename, decides the mime-type based on extension, and then readfile passes on the binary data.

In the above code we use the $type directly to decide the image path. You could instead map type to different directories. E.g.

switch($type){
case 'www':$path='/var/www/main_images/';break;
case 'www/articles':$path='/var/www/html/special/articles/';break;
case 'family':$path='/home/user/images/family/';break;
default:return $this->badAction();
}

Another way you might use $type is if certain users can only see certain types of images. You could do different validation checks in each case statement above.

Got any suggestions to make this code better? Let me know!

P.S. One special note about this technique. In the typical MVC Zend Framework setup, if you create public/img/ then apache will serve images from there; only if the image is missing in that location will Apache ask the Zend Framework to serve it. You might use this to your advantage to speed up delivery of certain common images. But it also opens the way to accidentally serving images that are supposed to be protected.

Tuesday, March 2, 2010

Creating A Doctrine Custom Behaviour, Part 2

In part 1 we made a simple custom behaviour and then gave it some options we could customize in our schema. The second half of the power of Doctrine's behaviours though is being able to set values. First, uncomment this line in the last example in part 1:


    $this->addListener(new DarrenTestableListener($this->_options));

You should get told class DarrenTestableListener does not exist, so create DarrenTestableListener.php in the same directory as DarrenTestable.php (though, as mentioned before, it can go anywhere in your models directory tree), and fill it with this code:


class DarrenTestableListener extends Doctrine_Record_Listener
{
    protected $_options = array();

    public function __construct(array $options)
    {
        $this->_options = $options;
    }

    /** Called when a new record is created */
    public function preInsert(Doctrine_Event $event)
    {
        $name = $event->getInvoker()->getTable()->getFieldName($this->_options['name']);
        $modified = $event->getInvoker()->getModified();
        if ( ! isset($modified[$name])) {
            $event->getInvoker()->$name = "C";
        }
    }

    /** Called when an existing record is updated  */
    public function preUpdate(Doctrine_Event $event)
    {
        $name = $event->getInvoker()->getTable()->getFieldName($this->_options['name']);
        $modified = $event->getInvoker()->getModified();
        if ( ! isset($modified[$name])) {
            $event->getInvoker()->$name .= 'U';
        }
    }

    /** Handle dql update queries */
    public function preDqlUpdate(Doctrine_Event $event)
    {
        $params = $event->getParams();
        $name = $event->getInvoker()->getTable()->getFieldName($this->_options['name']);
        $field = $params['alias'] . '.' . $name;
        $query = $event->getQuery();
        if ( ! $query->contains($field)) {
            $query->set($field, '?', 'U');
        }
    }

}

It has default options (and a constructor to set the options), then one function to handle INSERTs and two functions to handle UPDATEs. (There are also hooks for DELETE, save and validation.) All three functions follow a similar pattern:

Get the real fieldname
See if we need to do anything
If so, set the field

What we do is set the field to "C" when it is created. Then each time it is updated we append a "U". E.g. after three updates it will look like "CUUU". (TODO: I'm not so familiar with DQL, and I am not using it, so the DQL version replaces the existing data instead of appending. If you can supply the proper code for it let me know!)

Incidentally if you set a default in the options that will get precedence over your code (that is what the getModified() call is for, I assume). E.g. if I change my YAML schema to:


ActsAs:
    DarrenTestable:
        name: being_silly
        options:
            default: Hello!

Then after a couple of updates being_silly will contains "Hello!UU".

Creating A Doctrine Custom Behaviour, Part 1

Doctrine is a wonderful ORM library, but the manual is..., well, let's just say I think it was written by the developers, and out of obligation :-)

Behaviours are a motivating feature for using Doctrine, so here is a two-part tutorial on how to add your own minimal behaviour. The first part will show how to add a field to a table. The second part will show how to have its contents set automatically. For a more in-depth example see here or study the source code of the core behaviours.

The first thing you need to know is to put your behaviour classes in your models directory, or a sub-directory of your choice. You don't need a directory structure that matches the "XXX_YYY_" prefixes of the class name. This is assuming you are using Doctrine's auto-loading of table classes; if not you can put them anywhere you like of course. I'm putting them in models/behaviors/ (yes, using American spelling!)

Make a file called "DarrenTestable.php" with these contents:


class DarrenTestable extends Doctrine_Template{
    public function setTableDefinition(){
        $this->hasColumn('darren_testing','string');
    }
}

See, I told you it was minimal! To add it to an existing class, add this to your YAML schema:


ActsAs:
    DarrenTestable

Recreate the class from the schema and you should see a field called "darren_testing". On MySQL the type is "text" (i.e. this is what "string" maps to if you are using MySQL).

Even that very small example is doing something useful: if this behaviour is used in 10 classes and we need to change the name or data type (or set a default, add an index, etc.) we only need to edit it one place.

While still staying within the scope of a minimal example, we can expand it in two directions. We can have our code automatically set the field value (see part 2 of this article), and we can allow users to customize it. Here is how we add some options, with defaults:


class DarrenTestable extends Doctrine_Template
{
    protected $_options = array(
        'name'=>'darren_testing',
        'type'=>'string',
        'options'=>array(),
        );

    public function setTableDefinition()
    {
        $this->hasColumn($this->_options['name'],
            $this->_options['type'], null, $this->_options['options']);
        //$this->addListener(new DarrenTestableListener($this->_options));
    }
}

I have added the call to the listener we'll use in part 2, but commented it out for the moment. The options contains a sub-array called options which is passed on to Doctrine_Table::setColumn() and can be used to set defaults and so on.

Usage is exactly like above if we want the defaults. If we want to change the name, and require it to be set (i.e. NOT NULL) you would do:


ActsAs:
    DarrenTestable:
        name: being_silly
        options:
            notnull: true

Saturday, February 27, 2010

When a unix dot means a whole different world...

Subtitle: new ways to shoot yourself in the foot.

A week ago I was changing ownership on a web site subdirectory:
chown -R darren:www *

There were some files/directories that start with a dot, such as .htaccess, so I then did:
chown -R darren:www .*

If you just screamed out "Nooooo!!!!" you are allowed to polish your Unix Guru badge and wear it with pride. Personally I just sat there wondering why it was taking so long to return and why my disk was suddenly sounding so busy.

That's right. ".." refers to the parent directory. It had gone up a directory, then was using -R to recurse into every subdirectory on my web site, destroying previous ownership settings with abandon.

I realized after a few moments and killed it, but the damage was done, and I'm still discovering small parts of web sites that got broken. But a question for the gurus: how should I change all files and directories in a tree including those that start with a dot? Did I need to use some complicated find and chown combination command?

Tuesday, February 16, 2010

Google Messing Up I18n

People often seem to think Google can do no wrong; an image helped no doubt by being surrounded by companies such as Microsoft and Apple. But, even with all those computer science PhD's it snapped up, it sometimes just does not get it.

Once a month or so, Google resets back to showing me the search results in Japanese, and only searching Japanese pages. My browser is set to say I prefer English pages, but it ignores that and uses my IP address. Japanese people use a Japanese browser that will send a header saying they want to see Japanese. Unless they actually want to see English. Why is Google ignoring this?

Perhaps Google needs fewer PhD's and more people who know what this header means:
Accept-Language en-gb,en;q=0.5

Google (!) tells me Google need to look at section 14.4 of RFC2616.

But, the user-unfriendliness goes deeper. Here is how I get out of Japanese mode.
Click: 検索オプション
Find: 言語検索対象にする言語
and scroll down to find and select 英語

No! No, no, no! It just goes to show, even though I read Japanese I still messed up. Here is the correct way:

In the top-right of the search results click "設定 ▼" then 検索設定
In the line that says 表示言語の設定 scroll down to select 英語.
Then find the button in top-right that says 保存。When it pops up something that looks like an error message (but is actually a success message), press OK.

Hopefully I'm good for another month.

But the point of my rant is the only piece of English on any of those pages was "google". Not a single helpful line such as "switch this page to English". Perhaps they should use the Accept-Language header to decide what language the user would most like to see a "switch this page to XXX" link in?

Well, rant over. Google have been doing this for 10 years, so I don't expect they're going to change any time soon.

Thursday, February 4, 2010

Facebook Making Me Redundant

Back in 2007 I wrote an article on how to port your PHP applications into C++ (in the Oct 2007 issue of PHP Architect), showing the speed-up and memory-usage benefits. It's a nice idea, though I've only done it in a production project once. The programmer man-hours is very rarely worth it.

Well, Facebook have gone and automated the process, calling it Hip-Hop. Over 90% of Facebook's (400 billion PHP-based page views every month) traffic is now running on their Hip-Hop system, i.e. on PHP pages that have been compiled into C++.

This is very interesting to me as I have recently been writing less and less in C++, more and more in PHP. Even a tree searcher for computer go I have been doing in PHP, as an experiment to see how it goes. On reflection that particular experiment is probably a failure - I need to set bits in a 32-bit or 64-bit integer and PHP puts a layer of abstraction in the way.

The main php script is very slow (its been running non-stop for 5-6 weeks so far, with months left to run) but ironically all that time is being spent in calling other applications (written in C++!) to do the hard calculations. However I've another script that analyzes the data, needing to hold it all in memory at once, and that is showing signs of cracking - I'm already having to set my memory_limit to 1.5G, and that will probably hit 4G when the other script has chomped through all my data.

Hip Hop might be my saviour.

P.S. PHP Architect magazine also think this is going to be a very important technology.

Saturday, January 16, 2010

Zend Framework

I've been hearing a lot about Zend Framework over the past year or two, it seemed to have become the de facto standard framework (something else I've been hearing a lot about) and I felt a need to get up to speed on it.

I put up my review of my review of Easy PHP Websites with the Zend Framework. I bought the book a few months back, as it seemed the best of the choices at the time. Quick summary of the review: Okay book, not impressed by Zend Framework, last third of the book is useful and interesting.

I'll talk about not being impressed by the DB part of Zend below, but I feel the main other aspect that bothered me was the way data is given to and then used in views. In the controller I have to write:
$this->view->something=$mymodel->get_something();

Then in the view it is used as:
Our something is <?php echo $this->something;?> today!

All that $this-ing and view-ing makes the code verbose, hiding the real thing I'm trying to say with my code. In particular the view code is ugly - ZF has not helped at all compared to writing plain PHP.

Why do I care? Try getting a designer to edit a view that looks like that. They may be comfortable with HTML but will go pale and faint at the sight of a dollar sign let alone all the surrounding junk. I can tell you from experience they will refuse to even edit a file that contains text that looks like that, let alone type it themselves.

I was also not impressed with how Zend Form does input validation. It seems to be just as much work as doing it in plain PHP. If I add a dependency I want value for money!

Back to databases. I recently wrote about being impressed with Doctrine ORM. Zend DB didn't impress me at all; I already use Pear::DB to abstract away the database, and all Zend adds is having to use PHP instead of SQL to write common queries, while still having to use SQL to write the less common ones.

A web search found I wasn't alone in this opinion, but what was interesting was I found people using Doctrine within Zend Framework, as a replacement for Zend DB.

To quote an article by Ruben Vermeersch: "While Zend_Db isn't a bad technology, it is still quite low-level and close to the underlying database. Using Doctrine, you can manipulate your data like objects, without worrying about the database too much." The article is pro-Zend and pro-Doctrine, and shows how to use the two together. Good clear article.

Here is another example of using Zend and Doctrine together. It is just example code, not a teaching article, but interesting as it also uses ajax (using jquery).

Here is an article comparing Symfony (using Doctrine) with Zend, with a pro-Symfony slant. Also check out the comments, especially the reply by Matthew Weier O'Phinney describing what is coming up in Zend (including how it may have more formal integration with Doctrine at some point, and hopefully auto-admin pages and auto-forms-from-schema).

By the way, the afore-mentioned Ruben Vermeersch article suggests these articles to get up to speed on Zend:
http://framework.zend.com/docs/quickstart
http://akrabat.com/zend-framework-tutorial/

I'd like to stress again that my views of Zend Framework are currently only based on studying it; I'll get back to you once I've tried it out on a full-scale real-world project.

Thursday, January 14, 2010

ASP, Error-Handling and Microsoft's Latest Catchphrase

Microsoft's Latest Catchphrase: It may be crap, but at least it is ready installed crap!

What inspired today's rant? Using ASP, of course. The question I had was to how to handle SQL errors. (This article is more than just rant; I will also tell you how to do a couple of essential tasks that ASP makes hard.)

I've been using ASP for a few months, for one particular sub-project. We wanted to minimize the installed components, and as one sub-system already needed IIS, and the required scripting was fairly simple, we decided to go with that instead of installing WAMP (Windows Apache/MySQL/PHP).

I'm using ADODB COM object for the database connectivity to Microsoft SQL Server. I've used the same COM object from PHP before, and errors get thrown as exceptions. I wrap them in try/catch blocks and everything is good. A bit of searching discovered how it works in ASP/VBScript. First exceptions are thrown just the same. Second, there is no try/catch functionality.

Read those two key facts again, and have a good long think about them.

Some more searching discovered things are not quite that bad, and you can handle errors. The first thing you must do is precede your function with "on error resume next". That tells it to ignore errors. Then immediately after any action that might have had an error you do:

if err.Number <> 0 then
'Handle errors here
end if

See here and here.

By the way, to turn off "resume next" mode, you give the almost mystical "On Error Goto 0". And the other thing you need to know is that resume next mode only affects the current function; if you call another function and that function doesn't explicitly "on error resume next" then any error there will terminate the script. Which I suppose is better than the confusion resulting if the flag was global.

As my next piece of evidence let me show you how to do the equivalent of PHP's "s=file_get_contents(fname);" in ASP (due to the afore-mentioned difficulties in error-reporting it returns a blank string if the file does not exist):

Function file_get_contents(fname)
fname=Server.MapPath(fname)
mode=1 'Read-only
set FSO = server.createobject("Scripting.FileSystemObject")
if(FSO.FileExists(fname))then
set fp= FSO.OpenTextFile( fname, mode)
file_get_contents=fp.ReadAll
fp.Close
set fp= nothing
else
file_get_contents=""
end if
set FSO = nothing
End Function

Oh, what fun! But I try to be fair, and every language has its good and bad points, so I tried to think up all the good features of ASP. After all, I have actually been writing useful, working code in ASP. The bad points flashed into my mind at every turn, but in the end I did manage to find one good point...

It is already installed, if you are already using IIS.

Which brings us back to that catchphrase that Microsoft is hurriedly trade-marking in 120 countries worldwide: It may be crap, but at least it is ready installed crap!

(By the way, speaking of ready-installed crap, did you know IE6 still has 15-20% market share? And IE6+IE7+IE8 together have 67% share? It was news to me - I thought the browser wars were over and everyone used Firefox, or at least Safari!)

Tuesday, January 5, 2010

Portable PDF reader

More and more books, especially tech books, are only available as PDF. Another advantage is I can get the PDF delivered immediately rather than the "2-4 weeks" amazon.jp likes to tell me. But I don't like sitting at my computer to read - I want to be able to sit in an armchair, on the train, in a cafe, etc. to read (see also my moan about php Architect magazine going PDF only)

I've done some searching, and some asking, and I suspect the device I want does not yet exist, but I'm open to suggestions. I don't want the heavy bulk of a notebook. The Kindle (or similar) seem hard-wired for content being sold by Amazon (or whoever), which is no good at all (Update: PDF supported since version 2.3, and content can be uploaded over USB cable; see wikipedia article)

I'm very interested in how easy it is to read a PDF formatted for A4 on an iPhone (or iPod Touch), as that would have lots of other advantages (e.g. portable video player, as well as portable email and web browsing in the case of the iPhone).

UPDATE: list of e-book readers
Apparently the Consumer Electronics Show in Las Vegas this month will see some new e-book readers announced. So, sitting on my hands seems best at the moment.

UPDATE:
Video demo of AjiReader viewing a PDF on the iphone (apparently the app costs $1)

Another video demo showing PDF, DOC, XLS, PPT but the comments seem to be saying you need a jail-broken phone for this?

UPDATE: (8 months later!)
The new Kindle looks good. I decided to stop Waiting For The Next Big Thing, and went ahead and pre-ordered one. It does PDFs, it does Japanese, everyone I've spoken to says e-ink is great; but, above all, at $139 it is now affordable.

I also ordered the case with built-in light; it was 1/3rd of the total cost, so I dithered a lot about that. But while I think the case is over-priced it also looks very cool.

Some people said the 6" screen is still too small to read A4 PDFs in comfort. If I find it unusable for that (my key need) I'm hoping I might be able to resell it at only a small loss. After all, ordering from Japan is a pain.

I'll put up a review in about a month. Watch This Space :-)

Monday, January 4, 2010

Doctrine ORM

I mentioned before, when talking about my Step-By-Step Regex article in php|a magazine, that the same issue (August 2009) had some other interesting articles.

One is about Adminer, an alternative to phpMyAdmin which is designed to be very compact and exists in just one file (so it is easy to upload temporarily to a production server for instance); the article is mainly about how Adminer was written, which was interesting.

But the article - and library - I most want to talk about is Doctrine ORM. The main problem this library is solving is this: when you add a field to your database you have to make the change in at least two places, your PHP script and your database. That sucks. The article describes itself as a tour around Doctrine's more advanced features, but I found I didn't need any other introduction, and it sold the features in a very clear and logical way.

The first thing I liked was you can describe your database schema in any of three ways: SQL, PHP, or the compact YAML files, and utilities are provided to convert between all three. By SQL I mean an existing database, that you could have been creating interactively from phpMyAdmin, or it could be a legacy database that has been around for years and now needs to be connected to a php script.

What only the Yaml and PHP schema formats can specify are what Doctrine calls behaviours. By adding the Timestampable behaviour it will add created_at and updated_at fields to the database and keep them up to date for you. The Sluggable behaviour is for making URLs and is demonstrated in the php|a article, but others that caught my eye are SoftDelete (records just get the deleted_at field set when you delete, rather than physically being removed), and Versionable (each time a record is changed the previous record is saved, and can be reverted to). NestedSet helps build trees, I18n is for holding translations (see below), Searchable does indexing, Geographical can find nearby places.

Incidentally I covered I18n in a previous article and Doctrine is using type 1, but without the language table to store collation. There is no consideration of different collations, and no concept of default language. It is sufficient for most purposes though.

Looking at some behaviours not in the Doctrine core distribution, Taggable looks interesting, allowing you to add blog-style tags. Locatable apparently ties in with Google Maps to get latitude and longitude automatically given an address (it could then be used, for instance, with the Geographical behaviour to automatically find all people within 10km of you based on just their address). EventLoggable logs all actions (select, update, delete, etc.) to a disk file. Blameable records (in the table) who last made any change to a record.

Doctrine's behaviours feature is powerful, portable, the existing ones are customizable (all the above seem to have sensible defaults but also can have their behaviour fine-tuned) and new ones look easy to write.

The third big feature of Doctrine that I like is the way data is retrieved on demand, and how that ties in with table relations (see listing 6 in the php|a article). And if this becomes inefficient you can optimize without having to change much code (listing 7). I'm not a big fan of using PHP to write SQL queries. Generally to do anything interesting you end up dropping down to writing SQL anyway. But I was impressed with the example in the article, and you only need to start explicitly describing joins if you have profiled and found you need to optimize.

(By the way, it goes without saying that Doctrine ORM is database-neutral, not tying you into MySQL or any other database.)

The final cool Doctrine feature is migrations. When you make a change to the schema in either your YAML file or your PHP file you run a script and it will make a migration file. This can then be run to move a database between versions (backwards or forwards). I admit I currently do this with some hand-crafted SQL scripts, which are one way only, and painful. And sometimes I'm not even that organized.

I've not used Doctrine ORM for a real project, but I expect I will try it out very soon.