Tuesday, August 9, 2011

Debugging Regexes

Cue: dramatic music. There I was under pressure, enemy fire going off in all directions, and my unit test had started complaining. The test regex was 552 characters long, the text being tested was almost as long, and each run of the unit test takes 30 seconds. Talk about finding a needle in a haystack. James Bond only had to choose between cutting the red or the blue wire. He had it easy.

But I lived to tell the tale. Playing the Bond Girl in this scenario was http://www.regextester.com/ (I actually used version 2 which, though alpha, worked fine).

It still wasn't smooth sailing. The above site assumes the regex is surrounded by /.../ but mine wasn't. So, first I had my unit test output the regex, then I escaped it correctly for use with /.../ then pasted it into the Regex Tester. I also pasted in the text to test. It should match; it doesn't. So I put the cursor at the end of my regex and deleted a few characters at a time. After deleting about two-thirds (!!) of it, finally the text turned red and I had a match. I could see exactly where the match stopped and realize what was missing in my regex. I fixed the regex (simultaneously in RegexMatcher and in my unit test script) and repeated. I had to delete back to almost the same point. It took half a dozen passes before the whole regex matched.

The code looks to be open source javascript. So maybe I will hack on it, to automate the above process (my better Bond Girl, if you like): I would give the regex, the target text, say I expect a match, and it will find the longest regex that matches and show me how much of the target text got matched. (Ah, it uses ajax requests to back-end PHP for the preg and ereg versions, and that code is not available; but at least I could do this for javascript regexes.)

Enough with poking around inside today's Bond Girl. Down the cocktail, jacket on, back to the field...

Monday, August 1, 2011

svn, ssh, /bin/false, git, /etc, etc.

I just spent almost two hours troubleshooting an svn server. It was set up to allow ssh access from a few users over ssh as described in this helpful blog post.

Today I realized none of the svn clients could run svn update: "svn: Network connection closed unexpectedly"

Part of the challenge was that I'd not run "svn update" in 1-2 months, so I had a big time frame for breakage. But all I could remember changing recently was commenting out some lines in my iptables firewall, so I spent quite a while staring at that. And I'd updated some packages and rebooted.

I spent a lot of time looking at ssh verbose output. To do this use
export SVN_SSH="ssh -vv"
(-v for quite verbose, -vv for more, -vvv for even more, etc.) No error messages: it connects, runs the svnserve command, passes up some environmental variables, sends some data, and then simply loses the connection.

Then I wondered if the svn user had somehow lost permission for something important. It has /bin/false as its shell, so I gave it /bin/bash, used su to become root, then "su - svn". Ran the svnserve command. It is fine. Looked at the repository. It is fine. Ran svnadmin to check for locks. None.

Giving up, I started crafting an email to post somewhere asking for help. I went back to get the exact error message svn update was giving me... and it worked. Yep, the other client works now too. My first guess was running svnadmin had quietly fixed something. But then, on a hunch, I changed svn's shell back to /bin/false. Yep, that broke it. It seems the svn user has to have a valid login shell.

Okay, problem solved. But I'm sure that svn has always had /bin/false, and now I wanted to know if that is true, and if and when it changed. It is too late now, but ready for next time, I decided to put all of /etc into a git repository (no need for a central repository, so git is far better choice than svn for this). The git commands (all run as root) are trivial:
cd /etc
git init
git add .
git commit -m "Initial files" -a
I found this page of someone who has done something similar, and he suggested sending the git status report from a daily cron, so I did that too.

It is so easy, and disk space so plentiful, that I think I will do it on all my linux machines.

UPDATE: /etc has some files that get edited a lot, so to reduce noise this is my .gitignore file so far:
/cups/printers.conf*
/cups/subscriptions.conf*
/emacs/site-start*
/mtab
/adjtime
/ld.so.cache
/*-
*~

(Use "git rm --cached cups/printers.conf*" (for instance) to stop git tracking the files if they've already been added.)