Showing posts with label commandline. Show all posts
Showing posts with label commandline. Show all posts

Thursday, October 3, 2013

Backing-up a bunch of small files to a remote server

I have a directory, containing lots of files, and I want an off-site, secure backup.

Even though the remote server might be a dedicated server that only I know root password for, I still don't trust it. Because of the recent NSA revelations I no longer consider myself paranoid. Thanks guys, I can look at myself in the mirror again!

As a final restriction, I don't want to have to make any temp files locally: disk space is tight, and the files can get very big.


Here we go:

cd BASE_DIR
tar cvf - MY_FOLDER/ | gpg -c --passphrase XXX | ssh REMOTE_SERVER 'cat > ~/MYFOLDER.tar.gpg'


(the bits in capitals are the things you replace.)

Notes
  • The "v" in "tar cvf" means verbose. Once you are happy it is working you will want to use "tar cf" instead.
  • The passphrase has to be given in the commandline because stdin is being used for the data!! A better way is to put the passphrase in another file: --passphrase-file passfile.txt. However note that this is only "better" on multi-user machines; on a single-user machine there is no real difference.
  • I'm using symmetric encryption. You could encrypt with your key pair, in which case the middle bit will change to: gpg -e -r PERSON  Then you won't need to specify the passphrase.
  • In my case REMOTE_SERVER is an alias to an entry in ~/.ssh/config. If you are not using that approach, you'll need to specify username, port number, identity file, etc. By the way, I'm not sure this method will work with password login, only keypair login, because stdin is being used for the data.
  • Any previous MYFOLDER.tar.gpg gets replaced on the remote server. So, if the connection gets lost halfway during the upload then you've lost your previous backup. I suggest using a datestamp in the filename, or something like that.
What about to get the data back?

cd TMP_DIR
ssh REMOTE_SERVER 'cat ~/MYFOLDER.tar.gpg' | gpg -d --passphrase XXX | tar xf -


You should now have a directory called MYFOLDER, with all your files exactly as they were.


Outstanding questions

Is it possible to use this approach in conjunction with Amazon S3, Google Drive, Rackspace cloud files, or similar storage providers? E.g. 100GB mounted as a Rackspace drive is $15/month (plus the compute instance of course, but I already have that), whereas 100GB as cloud files is $10/month, or $5/month on google drive. ($9.50/month on S3, or $1/month for glacier storage). Up to 15x cheaper: that is quite an incentive.

2013-10-08 Update: The implicit first half of that question is: is there a way to stream stdout to the remote drive (whether using scp or a specific commandline tool).
For Amazon S3 the answer is a clear "no": http://stackoverflow.com/q/11747703/841830 (the size has to be known in advance).
For Google Drive the answer is maybe. There is a way to mount google drive with FUSE: https://github.com/jcline/fuse-google-drive   It looks very complicated, describes itself as alpha, and the URL for the tutorial is a 404.
For Rackspace CloudFiles (and this should cover all OpenCloud providers), you can use curl to stream data! See "4.3.2.3. Chunked Transfer Encoding" in the cloud files developer guide HOWEVER, note that there is a 5GB limit on a file. That is a show-stopper for me. (Though by adding a custom script instead of "ssh REMOTE_SERVER 'cat > ~/MYFOLDER.tar.gpg'", I could track bytes transferred and start a new connection and file name at the 5GB point, so there is still hope. Probably only 10 lines of PHP will do it. But if I'm going to do that, I could just as easily buffer say 512MB in memory at a time, and use S3)

NOTE: Because I've not found an ideal solution yet, I never even got to the implicit second part of the question, which is if the need to "cat" on the remote server side will cause problems. I think not, but need to try it to be sure.


Tuesday, October 11, 2011

Renaming files is easy!

What do you mean? Of course renaming is easy! From the GUI just click it and type the new name. From the linux commandline just type mv oldname newname.

But what about when you have 200 *.JPEG files and you need to rename them to be *.jpg? In the past I've written throwaway PHP scripts to do just this. Well, it turns out linux has a command called rename, that can use regexes. To rename those jpegs:
  rename 's/JPEG/jpg/' *.JPEG

What about a few thousand log files, and I want to remove the datestamp portion from the filename (because I'm going to keep each day's logs in their own subdirectory)? How about this:
  mkdir 20110203
  mv *.20110203.log 20110203/
  rename 's/.20110203//' 20110203/*

(See here for how to then use a bash loop to loop through all the datestamps you have to deal with.)

Saturday, September 17, 2011

Add a temporary static IP address

At home, with wired ethernet, my (Ubuntu) notebook has a few static IP addresses that I use for developing websites. Out of the house, I use wicd, so I have a dynamic IP address, and those static IPs don't exist. wicd configuration is too complex for me to understand, so I just accept this, but it caught me short the other day when I needed to have both an internet connection and to be able to work on a website running on my notebook.

I failed then, but I'm ready for next time. To temporarily add a static IP address you simply do (as root):
ifconfig eth0:3 10.10.10.10 netmask 255.255.0.0
I'm choosing "eth0:3" for the interface; it can be any unused number after the colon, and you never need to care what this is. netmask can really be anything for our purposes. The 10.10.10.10 is the IP address I've given it. Test with this:
ping 10.10.10.10
To set up a quick virtual host create a file under /etc/apache2/conf.d called 10.10.10.10.conf (any filename is fine) with these contents:
<virtualhost 10.10.10.10:80="">
    DocumentRoot "/var/www/somewhere"
    ServerName 10.10.10.10
</virtualhost> 
 
Tidyup
To remove just the interface that you added above, use this command:
ip addr del 10.10.10.10/32 dev eth0:3
Or, to restore the network to boot defaults (useful if you have done lots of changes) you can do:
ifdown -a
ifup -a
Either way to then remove the apache config: delete the 10.10.10.10.conf file you created and restart apache.

Friday, September 2, 2011

Using twitter oauth from commandline

The twitteroauth library is the most recommended PHP library for using oauth for API access to Twitter. It also supports the commandline approach (which can work completely behind a firewall, no need for a web server to host a callback page), but it is not very well documented.
(Note: when I say behind a firewall you do still need web access, because you need to login to twitter to get a PIN code; but this is much less demanding than needing to set up a web server on a public IP address.)

Anyway, after I'd worked it out, I put my sample code up on github. The three files to look at are oob1.php, oob2.php and oob3.php. Here is how you use them:

Step One:

Edit config.php, to set:
      define('OAUTH_CALLBACK', 'oob');

(If not already done, go to https://dev.twitter.com/apps, create and configure the app, and add the consumer key and secret to config.php.)

Step Two:
   php oob1.php

Step Three:
Visit the URL it tells you to, and approve the application

Step Four:
   php oob2.php 1234567
(where 1234567 is the PIN number you got at the end of step three)

Step Five:
   php oob3.php account/verify_credentials
(this command will show your account; see the oob3.php source code for other supported commands, and some shortcuts.)

Let me know if you have any problems with, or questions about, these files.
If you find bugs, or want to encourage its inclusion in the main twitteroauth library, you can comment on the github pull request.

Monday, August 1, 2011

svn, ssh, /bin/false, git, /etc, etc.

I just spent almost two hours troubleshooting an svn server. It was set up to allow ssh access from a few users over ssh as described in this helpful blog post.

Today I realized none of the svn clients could run svn update: "svn: Network connection closed unexpectedly"

Part of the challenge was that I'd not run "svn update" in 1-2 months, so I had a big time frame for breakage. But all I could remember changing recently was commenting out some lines in my iptables firewall, so I spent quite a while staring at that. And I'd updated some packages and rebooted.

I spent a lot of time looking at ssh verbose output. To do this use
export SVN_SSH="ssh -vv"
(-v for quite verbose, -vv for more, -vvv for even more, etc.) No error messages: it connects, runs the svnserve command, passes up some environmental variables, sends some data, and then simply loses the connection.

Then I wondered if the svn user had somehow lost permission for something important. It has /bin/false as its shell, so I gave it /bin/bash, used su to become root, then "su - svn". Ran the svnserve command. It is fine. Looked at the repository. It is fine. Ran svnadmin to check for locks. None.

Giving up, I started crafting an email to post somewhere asking for help. I went back to get the exact error message svn update was giving me... and it worked. Yep, the other client works now too. My first guess was running svnadmin had quietly fixed something. But then, on a hunch, I changed svn's shell back to /bin/false. Yep, that broke it. It seems the svn user has to have a valid login shell.

Okay, problem solved. But I'm sure that svn has always had /bin/false, and now I wanted to know if that is true, and if and when it changed. It is too late now, but ready for next time, I decided to put all of /etc into a git repository (no need for a central repository, so git is far better choice than svn for this). The git commands (all run as root) are trivial:
cd /etc
git init
git add .
git commit -m "Initial files" -a
I found this page of someone who has done something similar, and he suggested sending the git status report from a daily cron, so I did that too.

It is so easy, and disk space so plentiful, that I think I will do it on all my linux machines.

UPDATE: /etc has some files that get edited a lot, so to reduce noise this is my .gitignore file so far:
/cups/printers.conf*
/cups/subscriptions.conf*
/emacs/site-start*
/mtab
/adjtime
/ld.so.cache
/*-
*~

(Use "git rm --cached cups/printers.conf*" (for instance) to stop git tracking the files if they've already been added.)

Thursday, July 14, 2011

Customize less: less annoying

Open ~/.bashrc (or ~/.bash_profile) and add this line at the end:
LESS=-FRX;export LESS
Don't ask why, just trust me. ('cos actually I don't know what it means either...)

This was the magic incantation to tell "git log" to wrap commit messages (instead of just chopping off everything extra).

But it has had the delightful side-effect of man now works properly. About 6 or 7 years ago I used to be able to type "man whatever", press q, and the advice would stay on screen. Then linux (all distros as far as I could tell) changed to hiding the man page as soon as you press q. Frequently I'd have to have two terminal windows open, simply so I could keep the man page open while I type my command. Yes, you're ahead of me; the above LESS command fixes this. If I'd known it would be so easy I'd have done this years ago!!

Even better, the help system in the R language was working just like man pages (and it was even more annoying there), and that has been fixed too.

It is a miracle, I tell you, a miracle! July 13th should go down as "Saint LESS=-FRX" day; in 100 years time it will be a public holiday and children will be taught about it in schools. They might even be taught what it means...

Thursday, May 26, 2011

svn to git: central repository setup

The best thing about git is you can just start working; no messing around setting up the central repository, working out secure access, etc. But the learning curve for git, coming from svn, is much bigger than say from cvs to svn. The thing I'd not got my head around was what do you do when you need that central repository? For instance, when you have a 2nd developer! I think I've got it now, so here is my guide on it.

Let's assume it is a website ("mysite"), and we have three machines involved:
devel: where most development is done
store: the central repository server
www: the web server

The website is on devel currently, under git, and we want to check it out to www.
If not a website, but just two developers, then in this example "www" is the machine of that second developer.

For simplicity we'll assume all developers have ssh access to the "store" machine. And that all our git repositories will be kept under /var/git/

1. On store:
cd /var/git/
mkdir mysite.git
cd mysite.git
git init --bare

(see update#1 below, if your git does not support --bare)

2. On devel (from the root of the website)
git remote add origin darren@store:/var/git/mysite.git
git push origin master
(the "origin master" part is just needed the first time)

3. On www,
cd /var/www/
git clone darren@store:/var/git/mysite.git my.example.com
(this creates /var/www/my.example.com and checks out the current version of the site)
(you need to configure your web server to not serve the .git directory, as well as any other files end users should not see)

4. If we make a change on devel and want to update www.
First on devel, do the commits, then:
git push
Then on www:
git pull

5. If we make a change on www, and want to update devel.
First on www, do the commits, then:
git push
Then on devel do:
git pull origin master

Not quite symmetrical is it? To fix this, so in future you just need to do "git pull" from the devel machine too, edit .git/config. You'll see these lines:

[remote "origin"]
url = darren@store:/var/git/mysite.git
fetch = +refs/heads/*:refs/remotes/origin/*

Don't touch them but, just above them, add this block:

[branch "master"]
remote = origin
merge = refs/heads/master

(unless you are doing something clever, you can use that verbatim). Now "git pull" works without having to specify "origin master" any more.

So, "git commit ...;git push" is like "svn commit ...". And "git pull" is like "svn checkout".


UPDATE #1: Older versions of git don't have the --bare function. Do these steps (on "store" machine) instead:
cd /var/git/
mkdir mysite.git
cd mysite.git
git init
mv .git/* .
rmdir .git

UPDATE #2: What to do if you already have some files on www, with some differences from what is in git? E.g. I had a test and live version of a website, with no source control; the test version had a number of extra files. I scp-ed the live version to devel, created a git repository. Then I set it up and push-ed it to "store". Then on my test site, I had to do these steps:
#Temporarily move current test files out of the way
mv my.example.com{,.old}
#Get the git version
git clone darren@store:/var/git/mysite.git my.example.com
#Merge in the test site
cd my.example.com
cp -a ../my.example.com.old/* .
#See what is different
git status
#For all changed files, revert to the git (live server) version
git reset --hard
By the end of all this, git status will just list (as untracked) the files that were on test but were not on live.

Tuesday, December 7, 2010

scp with multiple targets: ssh-add

I sometimes have the need to upload the same file to two or more locations on the target server. In fact I even have a script to help me, which goes something like this:
scp file1 file2 remote01:/path/to/somewhere/
 scp file1 file2 remote01:/path/to/another/
The problem is I need to type the password for remote01 twice. I had never managed to find some clever scp syntax to allow specifying two destinations, and a post on the TLUG mailing list confirmed that. But what I did learn from TLUG is that there is something called ssh-agent that can store passphrases for key pairs; this plugged a gap in my knowledge. There are three ways to login in via ssh/scp:
  1. Give your password
  2. Make a keypair with no passphrase
  3. Make a keypair with a passphrase
The second way is used to allow scp from cron jobs, to automate copying files between two machines. I'd never really got the point of the third way: if you have to type a passphrase why not just use the first method, and save messing around making a key pair. (Well, yes, there is better security: a log-in then requires both something I have and something I know; you have to turn off PasswordAuthentication on your ssh server for this to have any meaning though. Thanks to Kalin for this comment.)

But it turns out there is this program called ssh-agent that remembers passphrases for you. And I found it is already running in the background on Ubuntu.

Enough chat, let's look at the solution. First, I created a one-off keypair for this script, on my machine, using something like this:
cd ~/.ssh
  ssh-keygen -t dsa -C me@example.com -f key_for_remote01
  chmod 600 key_for_remote01*
  scp -p key_for_remote01.pub remote01:~
When generating the keypair give a reasonably secure passphrase (you will have to type it in each time, and it is only of use to people in possession of key_for_remote1, so no need for a 20-random-character monster; I believe it is perfectly fine for it to be the same as your normal ssh login password for remote01).

Then log in to remote01 and append key_for_remote01.pub to ~/.ssh/authorized_keys. If that file does not exist then you can just rename key_for_remote01.pub to authorized_keys and move it into ~/.ssh/

(By the way, there is no need to put your private half of the keypair in ~/.ssh/ but that seems as good as place as anywhere else.)

Now, I modified my script as follows:
ssh-add -t 120 ~/.ssh/key_for_remote01
 scp -i ~/.ssh/key_for_remote01 file1 file2 remote01:/path/to/somewhere/
 scp -i ~/.ssh/key_for_remote01 file1 file2 remote01:/path/to/another/
 ssh-add -d ~/.ssh/key_for_remote01

What happens is the ssh-add line will ask you for your passphrase. The two scp lines then work automatically. Finally the ssh-add -d stops it caching your passphrase (forcing you to type your passphrase each time you run this script).

The -t 120 parameter says the passphrase will expire after two minutes. This is just in case the batch file doesn't complete and so does not get chance to run ssh-add -d.


Note: you can use that same key pair for other machines. Basically anywhere you put the *.pub half of the key pair will let you login. And you can login from anywhere you have the private half of the key pair.

Note: the timeout/deletion code above is deliberate for this application, but you don't have to do it this way. By allowing it to cache it permanently you would only be prompted for your passphrase once, and then all future ssh and scp logins would be automatic. They will be cleared when you log-out of gnome (on ubuntu, at least) or shutdown your machine.

Note: If you don't want to specify -i each time you use ssh/scp then you can add an entry to your ~/.ssh/config file, like this:
Host remote01
        Hostname 10.1.2.3
        Port 22
        IdentityFile ~/.ssh/key_for_remote01

Note: I used -C with my real email address. This is put in the public key, and I wanted the administrator of remote01 to know who put the key there. Without -C it defaulted to "myusername@myhost". The administrator of remote01 knows nothing about my machine names so this seemed unreasonable and I decided to use -C. But the advantage of that default is it seems ssh-agent knows about that name and will prompt automatically the first time you try to use ssh/scp, which means there is no need to run ssh-add first. I have not worked out yet if ssh-agent can be told to know about my email address too.

Sunday, November 28, 2010

Troubleshooting php and xinetd with strace

In a previous entry I mentioned I was trying to track down a difference in a php script between two machines. I've now got closer, close enough in fact to fix the problem but not to understand it. As a troubleshooter, the big discovery for me was how useful strace can be.

First, a PHP code snippet:
while(1){
stream_set_blocking(STDIN,true);
$s=trim(fgets(STDIN));
if($GLOBALS['log_fp'])fwrite($GLOBALS['log_fp'],$s."\n");
$params=explode(" ",$s);
$command=array_shift($params);
...
}

The first two lines say listen on stdin for a command, and wait forever for it to arrive. When it does we log it and then split it up into a command and parameters.

This script is used over xinetd. Xinetd listens on a port for us, and when it gets something it feeds it to stdin; php script output is then fed back over the socket to the client.

Here is the setup:
  • machine A: ubuntu 8.04, 32-bit, php 5.2
  • machine B: ubuntu 10.04, 64-bit, php 5.3

Both machines run an indentical version of xinetd: "xinetd Version 2.3.14 libwrap loadavg".

Machine A runs fine, machine B sometimes sits there. Analyzing this some more, by connecting over telnet to machine B:
  1. I send a command
  2. It replies straightaway
  3. After 60 seconds or so it gives an error message
  4. If I do nothing this error message appears again every 60 seconds.

On machine A, or if I run the php script directly on machine B, it behaves like this:
  1. I send a command
  2. It replies straightaway
  3. It sits there forever waiting for input

Out of desperation I ran strace (using -p you can attach it to a running process; use ps -a | grep php to find the PID), and I was pleasantly surprised to see it was not too verbose.

Over xinetd on machine B (the problem configuration) the interesting snippet is:
...
write(1, "mydata\n", 7)                 = 7
fcntl(0, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(0, F_SETFL, O_RDWR)               = 0
poll([{fd=0, events=POLLIN|POLLERR|POLLHUP}], 1, 60000) = 0 (Timeout)
write(3, "\n", 1)                       = 1
write(5, "\n", 1) 
...

On machine A over xinetd it instead looks like this:
...
write(1, "mydata\n", 7)                 = 7
fcntl64(0, F_GETFL)                     = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(0, F_SETFL, O_RDWR|O_LARGEFILE) = 0
read(0,
(Yes nothing comes after the "0," until I send some input.)

And direct access to the php script on machine B it practically the same:
...
write(1, "mydata\n", 7)                 = 7
fcntl(0, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl(0, F_SETFL, O_RDWR|O_LARGEFILE)   = 0
read(0,

So, the cause is clear: fgets() blocks for only 60 seconds instead of blocking forever, but only on this machine and only over xinetd. And the bug in my script then becomes clear: I'm not prepared to handle blank input!

Here is my fix:
while(1){
stream_set_blocking(STDIN,true);
$s=trim(fgets(STDIN));
if($s=='')continue;
if($GLOBALS['log_fp'])fwrite($GLOBALS['log_fp'],$s."\n");
$params=explode(" ",$s);
$command=array_shift($params);
...
}


As an ironic postscript, the real problem was elsewhere (a mismatch in another program version and the configuration being used for it!), and I also used strace to track that mistake down. But, what was so important about fixing the above was that it stopped distracting me with an error message every 60 seconds on machine B, when the configuration error was actually on machine A!

Thursday, November 25, 2010

bash tips: meld on program output

Here is the challenge: I want to run diff on two large log files, but I'm only interested in the entries at a certain time in each log file.

This used to require four commands:
grep "ABC" a.txt >tmp1.txt
  grep "ABC" b.txt >tmp2.txt
  diff tmp1.txt tmp2.txt
  rm tmp1.txt tmp2.txt
(Imagine "ABC" is a datestamp, but it could be any other way to filter your log file.)

Thanks to the gurus on TLUG's mailing list I can now do this as a one-liner:
diff <(grep ABC a.txt) <(grep ABC b.txt)
It works perfectly for meld (a wonderful visual diff program) too! Here is another way to use it, to compare the output of a program on two different machines (here I'm comparing the php configuration):
diff <(php -i) <(ssh someserver 'php -i')
We're using the form of ssh that runs a program on the remote server. The command in the brackets can get quite complex. Here is an example where I needed to compare datestamps in two csv files, but the first field was an id number, arbitrary, and therefore different for all records.
diff <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}' dir1/abc.csv)
  <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}' dir2/abc.csv)
The -o flag to egrep tells it to only output the matching part, not the whole line. This next version shows the complete rest of the line starting with my datestamp field; i.e. this version just excludes the csv field(s) before the datestamp field:
diff <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}.+' dir1/abc.csv)
  <(egrep -o '[0-9]{8} [0-9]{2}:[0-9]{2}:[0-9]{2}.+' dir2/abc.csv)

Monday, October 18, 2010

Bash loops

Faced with having to write 160 commands by hand (to split 5000 files into subdirectories, based on the first 3 digits of their 6 digit filenames) I quickly learnt bash loops instead:

for i in `seq 102 189`;
  do
    mkdir $i
    mv $i???.sgf $i/
  done  

I.e. it does commands like this:
mkdir 102
  mv 102???.sgf 102/
  mkdir 103
  mv 103???.sgf 103/
  ...

UPDATE: In a comment, traxplayer kindly explained how I would make, for instance a filename like 120abc.txt; obviously writing $iabc.txt won't work. The trick is to write $i as ${i}. E.g.
mv ${i}abc.sgf $i/

I just needed to use it, and it worked! This bash script tries to find the last reference to each of XXX06..XXX20 in two logfiles.
rm lastorder.log
touch lastorder.log

for i in `seq 6 9`;
    do
        grep -h XXX0${i} order_verbose.log.old order_verbose.log | tail -1 >> lastorder.log
    done

for i in `seq 10 20`;
    do
        grep -h XXX${i} order_verbose.log.old order_verbose.log | tail -1 >> lastorder.log
    done

By the way, traxplayer also said he prefers to use $(...) instead of `...` for readability:
  for i in $(seq 102 189);

Monday, September 13, 2010

When bash history no longer works

I've been pulling my hair out: I have all the HISTORY variables set correctly, but history never survives to the next session. I thought it must be an Ubuntu 10.04 bug, as only that machine, but couldn't track down anyone reporting the same problem.

The breakthrough came when I stopped typing "history" to view the history, and decided to "cat ~/.bash_history". I got permission denied... and discovered the file was owned by root. Suddenly it all made sense.

(By the way, I've a bunch of blog articles about setting up a new Ubuntu dual boot, RAIDed, encrypted machine; coming soon to an RSS feed near you...)

Thursday, April 15, 2010

find, grep and tr

A practical example of three useful unix commands, and how they can work together.

I thought I had found another chance to use sed (see Right Sed Fred, I'm too sexy to search and replace), but in the end did not use it; I wanted to remove newline characters but sed works line by line so this is not possible. (There is a way to do this with sed apparently, but it looked awfully hard to understand.) So I used tr instead.

Here is my final command:
find /path/to/myfiles/ -mtime -125 -name '*txt' -print | grep -v xxx | tr '\012' ' '

The first part:
find /path/to/myfiles/ -mtime -125 -name '*txt' -print

means search all the *.txt files under the given path, that have been created or modified in the past 125 days. It outputs one filename per line.

However it include some files I didn't want. Luckily they all had the same filename component ('xxx' in my example above), so were trivial to identify. I used "grep -v" which means exclude anything that matches.

At this stage I had the list of filenames, but one per line. I wanted to use them as the parameters to a batch file, so needed them all on one line. I couldn't get sed to work, but stumbled on a way to use the "tr" tool. It takes two parameters: the character to replace, and what to replace it by. '\012' is octal for the unix linefeed character. So
tr '\012' ' '

means replace LF with a space. I slapped this on to the previous command and got just want I wanted.

(For unix beginners, the | is called the pipe character, and it means take the output of the command before it and give it as the input to the command after it. Many unix commands are designed with this kind of piping behaviour in mind.)

Saturday, February 27, 2010

When a unix dot means a whole different world...

Subtitle: new ways to shoot yourself in the foot.

A week ago I was changing ownership on a web site subdirectory:
chown -R darren:www *

There were some files/directories that start with a dot, such as .htaccess, so I then did:
chown -R darren:www .*

If you just screamed out "Nooooo!!!!" you are allowed to polish your Unix Guru badge and wear it with pride. Personally I just sat there wondering why it was taking so long to return and why my disk was suddenly sounding so busy.

That's right. ".." refers to the parent directory. It had gone up a directory, then was using -R to recurse into every subdirectory on my web site, destroying previous ownership settings with abandon.

I realized after a few moments and killed it, but the damage was done, and I'm still discovering small parts of web sites that got broken. But a question for the gurus: how should I change all files and directories in a tree including those that start with a dot? Did I need to use some complicated find and chown combination command?

Saturday, December 19, 2009

Right Sed Fred, I'm too sexy to search and replace

Last week I hand-edited 22 XML files to change one attribute in each. I had only ten minutes, and I knew that solution could be done in that time.

Today I had exactly the same task, but with less time pressure, I went hunting. Here is the solution I used:
sed -i 's/old/new/g' *.xml

-i means replace inline. I had struggled with sed years ago and thought it was a horrible monster that made emacs look user-friendly in comparison. But that is so simple. Perhaps I was hurt before by trying to do something that couldn't be described as a simple regex?

Actually I vaguely remember my need at that time was to modify all html files in a directory tree, which sed cannot do. But with find it can:
find . -iname \*.html -execdir sed -i 's/<html>/<html mytag="test">/g' {} +

That inserts an attribute in the html tag of all *.html files in current directory and its subdirectories. (Shamelessly stolen from comments on this page here, and then I did a quick test to confirm I hadn't introduce a typo.)

Cool. One step closer to unix guruness. (sed is also available in cygwin, which is where I was actually doing the edit that started this article.)

UPDATE: for an example where tr is more useful than sed see
find, grep and tr.


UPDATE: Here is an expansion of the find+sed example above. I wanted to alter three types of extensions: html, php, phtml. And I only wanted to alter those files in just certain subdirectories. The -regex parameter of find seemed to do the job:

find . -regex './\(dir1\|dir2/subdir1\|dir3\|dir4\)/.+\.\(html\|php\|phtml\)' -execdir sed -i 's/ABC/XYZ/g' {} +

That is the command exactly as you run it at the bash command prompt. Notice that, in the regex, not just (), but also | need to be prefixed with a single backslash.

Thursday, January 29, 2009

Merging PDF files (and more!)

My mobile phone company decided, 7 months ago, to save the world by not sending paper invoices any more. Unfortunately they didn't tell me (and registering to see invoices online required jumping through lots of hoops, while holding a PIN number I didn't know).
All sorted now, and I only lost one invoice (they only keep the last six online). So I downloaded six shiny new PDF files. Like my phone company I also want to save the world, so wanted to print them four to a page.

The solution is pdftk (in Ubuntu's package list; also available for other major distributions I believe). This simple command:
pdftk 2008*.pdf cat output all.pdf

It just worked. Nice. It also does lots of other cool stuff, like adding or removing encryption, remove just one page out of the middle of a PDF, rotate, extract text. No mention of i18n, so I don't know (yet) how it will cope with Japanese PDFs. Or PDFs where the text is actually an image. The projects home page is http://www.accesspdf.com/pdftk/

It appears there are also Windows and Mac versions.

Tuesday, January 13, 2009

linux shell: two directories quick switch

Sometimes I have to work in two directories from the commandline. For instance to check a log file got created, then back to the directory where the command I am testing is run from. The quick tip is simply "cd -". This takes you back to the previous directory. If you type it again it takes you back to where you were, acting as a nice toggle.

"cd" with no parameters takes you to your home directory. "cd ~/somewhere" takes you directly to a directory relative to your home directory (i.e. tilde expands to your home directory). I had a hunt around but couldn't find any other special characters to use with "cd", but if you know one please let me know.

(This tip was found in Linux Magazine, June 2008)

Sunday, May 18, 2008

sprouts on linux

I've been using sprouts for AS3 on linux as a way to get straight into developing without having to learn about the tools or libraries I need. I'm not sure I'll continue using it once I get into more serious work, as it is a bit opaque (i.e. as with just about any framework, it is easy to do things it does by default, and really difficult to do anything else) and is based on ruby which is not one of my favourite languages.

It has also not been very smooth to use and install, which is what the rest of this article is about.

I'm following the instructions on:
http://www.projectsprouts.org/getting_started.html

on ubuntu. I had ruby installed, and just installed the gem package
(i.e., sudo gem --help then worked). (I also needed to install the rake package.)

But the following line fails:

sudo gem install sprout
Bulk updating Gem source index for: http://gems.rubyforge.org
ERROR: While executing gem ... (Gem::GemNotFoundException)
Could not find sprout (> 0) in any repository

Ah, simply trying it again works. Note that it will ask about all the "required" dependencies. If they are truly "required" then no choice but to say "Y"?

$ sudo gem install sprout
Bulk updating Gem source index for: http://gems.rubyforge.org
Select which gem to install for your platform (i486-linux)
1. sprout 0.7.171 (mswin32)
2. sprout 0.7.171 (darwin)
3. sprout 0.7.171 (x86-linux)
4. sprout 0.7.171 (ruby)
5. Skip this gem
6. Cancel installation
> > 3
Install required dependency rubyzip? [Yn]
Install required dependency archive-tar-minitar? [Yn]
Install required dependency rails? [Yn]
Install required dependency rake? [Yn]
Install required dependency activesupport? [Yn]
Install required dependency activerecord? [Yn]
Install required dependency actionpack? [Yn]
Install required dependency actionmailer? [Yn]
Install required dependency activeresource? [Yn]
Install required dependency net-sftp? [Yn]
Install required dependency net-ssh? [Yn]
Install required dependency needle? [Yn]
Install required dependency open4? [Yn]



Next, this line fails:
sprout -n as3 SomeProject

Run "gem contents sprout" to see it has been installed in:
/var/lib/gems/1.8/gems/sprout-0.7.171-x86-linux/bin

It is not executable so:
cd /var/lib/gems/1.8/gems/sprout-0.7.171-x86-linux/bin/
sudo chmod +x sprout

You can temporarily add it to the path with:
export PATH=$PATH:/var/lib/gems/1.8/gems/sprout-0.7.171-x86-linux/bin

This now works:
sprout -n as3 test1


But:
cd test1
rake

fails with the line: "no such file to load -- sprout".


This seems to do the trick:
rake -I /var/lib/gems/1.8/gems/sprout-0.7.171-x86-linux/lib/


Q. Where do I set that so it is a global setting??


Another problem: when I double click a swf file it tries to open it in
movie player. I try to connect to flashplayer, but that name does not
exist in /usr/bin.
Q. Where has the standalone flash player been installed??