Sunday, October 25, 2009

Regex Article in php|a

The August edition of PHP Architect magazine contained my introductory article on regexes. It has a PHP slant but mainly it is about regexes. Well, most of it is introductory but it contains a bonus section at the end using advanced regexes in SQL to repair a database in situ.

The print magazine arrived by air mail a couple of weeks ago and I finally got to read it. I was pleased (and relieved) that it read well. If you have read it I'd love to get some constructive criticism - especially if you are uncomfortable with regexes.

It was also my first print article to use colour, and I think that worked well too.

One correction in the complicated "Repairing With Regexes" section at the end of the article. The text says "So \1 has to be written \\\\1 (that is four backslashes)." That seemed a strange thing to write, so I took a look at the unit test source code (included with the PDF download version of php|a magazine) I had written. Four backslashes is correct when in PHP, but when in SQL it should be "\\1" not "\1".
(I just checked my emails from when we were proof-reading, and one of the edits had removed all those backslashes; I caught that at the time but ironically I got it wrong when putting them back in.)

The other minor correction I only realized after Arne Blanket's column in the same magazine issue! I had written: " IP address after that at sign, such as darren@, which, while unusual, is technically valid". In fact "darren@[]" is the technically valid form. Which is doubly annoying because my article's regex would reject those square brackets and I hadn't explicitly pointed that out. (No harm done, though, as this email form is highly discouraged.)

Arne's column also points out that top-level domains are no longer just two or three characters. So my '\.[a-zA-Z]{2,3}$' suggestion should really have been '\.[a-zA-Z]{2,}$'. Luckily, that suggestion was just in a list of other ideas, not part of the article's main regex.

(By the way I enjoyed, and will blog about real soon, some other articles in this particular issue; if you are not a subscriber, and your work involves data and PHP, then this is the back issue you should get!)

Wednesday, October 14, 2009

Control firefox from PHP?

Internet Explorer can be controlled from a COM object interface, and therefore from PHP (i.e. any scripting language that has COM support).

But is there a way to get script control of firefox? Ideally I'm looking for a platform-independent solution, and something I can use from PHP. Google is not helping (PHP's dominance as a server-side technology overwhelms the client-side related hits).

Here is my dream PHP script:

$firefox=new FirefoxInstance();
foreach($links as $id=>$info){
if(!$found)echo "MLSN link is missing...\n";



Et cetera. I.e. I'm talking about operating firefox the same way a user does; I know I can grab the raw HTML, parse it, etc. all from PHP, but that doesn't test a web site the same way clicking links in a browser does. Especially web pages with javascript, iframes, AJAX, etc.

(I'd heard of XPCOM but, if I've understood it correctly, it is a library to build firefox and its extensions, not something to control firefox? It also has no PHP bindings.)

BTW, going back to controlling IE from the COM interface, I don't suppose anyone has seen a detailed tutorial on how to use it to fill out and submit forms? I only ever see simple examples of how to set the URL, but I believe full control should be possible.
Dec 16th 2009 UPDATE: This came to the top of my to-do list so I read the MSDN docs on the Internet Explorer COM object, and now it is my understanding that I cannot manipulate and submit forms via the COM object. None of the example usage even hinted at doing this.