Tuesday, August 9, 2011

Debugging Regexes

Cue: dramatic music. There I was under pressure, enemy fire going off in all directions, and my unit test had started complaining. The test regex was 552 characters long, the text being tested was almost as long, and each run of the unit test takes 30 seconds. Talk about finding a needle in a haystack. James Bond only had to choose between cutting the red or the blue wire. He had it easy.

But I lived to tell the tale. Playing the Bond Girl in this scenario was http://www.regextester.com/ (I actually used version 2 which, though alpha, worked fine).

It still wasn't smooth sailing. The above site assumes the regex is surrounded by /.../ but mine wasn't. So, first I had my unit test output the regex, then I escaped it correctly for use with /.../ then pasted it into the Regex Tester. I also pasted in the text to test. It should match; it doesn't. So I put the cursor at the end of my regex and deleted a few characters at a time. After deleting about two-thirds (!!) of it, finally the text turned red and I had a match. I could see exactly where the match stopped and realize what was missing in my regex. I fixed the regex (simultaneously in RegexMatcher and in my unit test script) and repeated. I had to delete back to almost the same point. It took half a dozen passes before the whole regex matched.

The code looks to be open source javascript. So maybe I will hack on it, to automate the above process (my better Bond Girl, if you like): I would give the regex, the target text, say I expect a match, and it will find the longest regex that matches and show me how much of the target text got matched. (Ah, it uses ajax requests to back-end PHP for the preg and ereg versions, and that code is not available; but at least I could do this for javascript regexes.)

Enough with poking around inside today's Bond Girl. Down the cocktail, jacket on, back to the field...

1 comment:

Keith said...

The RegExPal project may be of interest.