Monday, November 23, 2009

Escaping CSV in C++

There are two escaping rules for each field in a comma-separated value row:
1. Change each double quote to two double quotes.
2. Surround with double quotes if the field contains a comma or double quote.

These are the rules used by Excel and all other software that deals with CSV data.

As an example, if my fields are:
hello world
a,b,c
"CSV" is popular
""

Then it becomes:
hello world,"a,b,c,","""CSV"" is popular",""""""

C++ has a justified reputation as a hard language for text manipulation. Boost has libraries to make it a little easier, but I didn't want to add Boost as a dependency for a project I was working on. Fortunately std::string's replace() function turned out to be more powerful than I had realized:

void output_csv(std::ostream &out,std::string s){
if(s.find('"')!=std::string::npos){ //Escape double-quotes
std::string::size_type pos=0;
while(1){
pos=s.find('"',pos);
if(pos==std::string::npos)break;
s.replace(pos,1,"\"\"");
pos+=2; //Need to skip over those two quotes, to avoid an infinite loop!
}
out<<'"'<<s<<'"';
}
else if(s.find(',')!=std::string::npos){ //Need to surround with "..."
out<<'"'<<s<<'"';
}
else out<<s; //No escaping needed


If you like compact code then the while loop can be rewritten:

void output_csv(std::ostream &out,std::string s){
if(s.find('"')!=std::string::npos){ //Escape double-quotes
for(std::string::size_type n=0;(n=s.find('"',n))!=std::string::npos;n+=2)s.replace(n,1,"\"\"");
out<<'"'<<s<<'"';
}
else if(s.find(',')!=std::string::npos)out<<'"'<<s<<'"';
else out<<s;
}


P.S. If you need to do the same in PHP, PHP 5.1 finally introduced fputcsv for it. The comments on that page show how to do it in older versions of PHP; my fclib library also contains functions for it.

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.