Monday, November 23, 2009

Escaping CSV in C++

There are two escaping rules for each field in a comma-separated value row:
1. Change each double quote to two double quotes.
2. Surround with double quotes if the field contains a comma or double quote.

These are the rules used by Excel and all other software that deals with CSV data.

As an example, if my fields are:
hello world
a,b,c
"CSV" is popular
""

Then it becomes:
hello world,"a,b,c,","""CSV"" is popular",""""""

C++ has a justified reputation as a hard language for text manipulation. Boost has libraries to make it a little easier, but I didn't want to add Boost as a dependency for a project I was working on. Fortunately std::string's replace() function turned out to be more powerful than I had realized:

void output_csv(std::ostream &out,std::string s){
if(s.find('"')!=std::string::npos){ //Escape double-quotes
std::string::size_type pos=0;
while(1){
pos=s.find('"',pos);
if(pos==std::string::npos)break;
s.replace(pos,1,"\"\"");
pos+=2; //Need to skip over those two quotes, to avoid an infinite loop!
}
out<<'"'<<s<<'"';
}
else if(s.find(',')!=std::string::npos){ //Need to surround with "..."
out<<'"'<<s<<'"';
}
else out<<s; //No escaping needed


If you like compact code then the while loop can be rewritten:

void output_csv(std::ostream &out,std::string s){
if(s.find('"')!=std::string::npos){ //Escape double-quotes
for(std::string::size_type n=0;(n=s.find('"',n))!=std::string::npos;n+=2)s.replace(n,1,"\"\"");
out<<'"'<<s<<'"';
}
else if(s.find(',')!=std::string::npos)out<<'"'<<s<<'"';
else out<<s;
}


P.S. If you need to do the same in PHP, PHP 5.1 finally introduced fputcsv for it. The comments on that page show how to do it in older versions of PHP; my fclib library also contains functions for it.

Monday, November 16, 2009

samba and strange permissions

On my linux server I run a samba share, which is used by both Linux and Windows clients on the LAN. I moved it from an old FedoraCore machine to Ubuntu a few months ago, and ever since have been getting strange permissions: text files kept becoming executable (but only for user, not group or other).

It took me this long to realize what was going on, and when I tried to track it down about a week ago I concluded it was SciTE being strange on just samba partitions. I.e. I'd edit a file with rw-r--r-- permissions, save it, and it would end up with "rwxr--r--". Every time. But not on normal partitions, and gedit didn't do it on the samba partition. I noticed today that files created by a PHP script also got those weird permissions, and the penny dropped: gedit must be explicitly setting permissions when it saves a file. Scite wasn't the cause of the problem at all.

So, I went hunting again. I referred to Mount samba shares with utf8 encoding using cifs a lot, but in fact it didn't give me the answer I wanted: the instructions there gave me the same problem. (It did show me how to set my samba partition to mount from /etc/fstab, replacing my crude entry in rc.local though.)

Hunting through the troubleshooting section I found the "nounix" flag and tried that. Initially it made things worse, giving all files rwxrwxrwx permissions. Then I changed from "file_mode=0777,dir_mode=0777" to "file_mode=0644,dir_mode=0755" (which was what I had originally), and that combined with nounix works! All text files are rw-r--r-- before and after saving. Oh, the other change I added relative to the above page was including "uid=darren,gid=darren". Otherwise files were owned by "nobody:root" and I didn't have permission to edit anything (even with the suggested 0777 settings).

My guess is that my old FedoraCore samba server didn't have the unix extensions, Ubuntu 8 does, and somehow those unix extensions are misconfigured in Ubuntu. My samba server configuration is all defaults however... Anyway, it now works the way it has for the previous few years, so I'm happy.

UPDATE: I just realized this has also fixed another irritation - delete (that moves to the Trash folder) hasn't been working on that partition, but now it does. Trash folder vs. direct delete was only a minor factor; what was really annoying was every time I pressed delete it then popped up a dialog box requiring me to confirm it.

Wednesday, November 4, 2009

How much is that password worth?

These people have published details about how they used Amazon EC2 to crack passwords:
http://news.electricalchemy.net/2009/10/cracking-passwords-in-cloud.html

Personally, I skipped all the details and went straight to the interesting conclusions page:
http://news.electricalchemy.net/2009/10/password-cracking-in-cloud-part-5.html

It tells you how much it will cost someone (in EC2 charges) to crack your passwords, based on their lengths and the number of characters you use.

I used to think 8 characters was a good password. Seems it is worth about $3, or $45 if I've mixed in some numbers. Gulp. And all this is assuming there are no dictionary words in there. Double gulp.