I wonder if a couple of things could be at play here.
The original email used csv to mean "character separated value
(aka tab delimited)", but I'm used to seeing CSV mean "comma
separated value". Some versions of CSV also allow double-quotes
to enclose values, which significantly effects parsing.
Whether separated by commas or tabs, the data values will have to
be cleansed of the delimiter (tab or comma) for the format to work.
In much of my work, tabs are better value delimiters because they're
either more rare in values or easier to clean out of values (tending
to be less significant than commas).
--- On Tue, 31 Oct 2006, Charles Blair wrote:
> On Tue, Oct 31, 2006 at 09:29:23AM -0500, Kai Naumann wrote:
> > we are planning a business rule for the character separated value
> > format (aka tabbed format), dealing with the best choice for field
> > delimiter, and with the problem of text delimiters encountered
> > inside texts.
> the typical problem with tab-delimited is encountering tabs or
> newlines inside a field value, input by people who want to "format"
> the data, say in a description field. in these cases i tell them to
> export the data including field names as the first row. my parser
> count these, to tell it how many fields it should expect, then it
> counts the fields in every row. it reports when it encounters a row
> that has more or fewer fields than expected, returning the row number
> with how many fields it found, in which case i send the data back to
> the user with the report and tell them to fix the problem. it's simple
> enough to write these parsers in your language of choice.
> with csv you're not going to have the problem with embedded tabs. i
> can't remember offhand how much of a problem embedded newlines
> represent. it's simple enough to experiment with, though.
> another issue you might want to keep in mind is character encoding if
> that is relevant in your situation. people using these formats
> typically are generating data using MS Windows products, which default
> to codepage 1252 for character encoding. since i typically want to
> convert tab-delimited or csv to xml, i need to convert anything i get
> from these sources to utf-8 using a tool such as GNU recode, or tell
> them to export as utf-8 (but check what you get in these cases).