Perhaps we have some common ground here. Exactly the use case you describe here is the problem I have when writing code to parse MARC21 records - except to a machine whether it's English, Chinese or some other language/character set, it is equally hard.

However the conclusion that I (and others) draw from this is the data would be more usable if the meanings were separated out further - so in each case where punctuation is the current way we can differentiate bits of data we should have each bit of data in a separate field/property. (UKMARC had less punctuation included in the record, and so library systems in the UK market were, once upon a time, designed to insert punctuation where necessary - I'm pretty sure this is true of other flavours of MARC as well?).

At the least I now have a new way of talking about the problems of machine parsing records to those unfamiliar with writing code - thank you!


Owen Stephens
Owen Stephens Consulting
Email: [log in to unmask]
Telephone: 0121 288 6936

On 28 Jan 2013, at 22:22, J. McRee Elrod <[log in to unmask]> wrote:

>> Maybe I'm missing something, but if we capture each subelement in
>> it's own property, there'd be no need to parse a long string with
>> ISBD punctuation.You'd know exactly what was in each, even if you
>> didn't have a display system.
> If it is in Chinese characters,  I don't know where the place ends and
> the publisher begins, nor where the title proper ends and subtitle
> begins.  I only got up the 300 characters and gave up.
>   __       __   J. McRee (Mac) Elrod ([log in to unmask])
>  {__  |   /     Special Libraries Cataloguing   HTTP://
>  ___} |__ \__________________________________________________________