Thursday, November 25, 2010

bash tips: meld on program output

Here is the challenge: I want to run diff on two large log files, but I'm only interested in the entries at a certain time in each log file.

This used to require four commands:
grep "ABC" a.txt >tmp1.txt
  grep "ABC" b.txt >tmp2.txt
  diff tmp1.txt tmp2.txt
  rm tmp1.txt tmp2.txt
(Imagine "ABC" is a datestamp, but it could be any other way to filter your log file.)

Thanks to the gurus on TLUG's mailing list I can now do this as a one-liner:
diff <(grep ABC a.txt) <(grep ABC b.txt)
It works perfectly for meld (a wonderful visual diff program) too! Here is another way to use it, to compare the output of a program on two different machines (here I'm comparing the php configuration):
diff <(php -i) <(ssh someserver 'php -i')
We're using the form of ssh that runs a program on the remote server. The command in the brackets can get quite complex. Here is an example where I needed to compare datestamps in two csv files, but the first field was an id number, arbitrary, and therefore different for all records.
diff <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}' dir1/abc.csv)
  <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}' dir2/abc.csv)
The -o flag to egrep tells it to only output the matching part, not the whole line. This next version shows the complete rest of the line starting with my datestamp field; i.e. this version just excludes the csv field(s) before the datestamp field:
diff <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}.+' dir1/abc.csv)
  <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}.+' dir2/abc.csv)

No comments: