LISTSERV mailing list manager LISTSERV 16.0

Help for BIBFRAME Archives


BIBFRAME Archives

BIBFRAME Archives


BIBFRAME@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

BIBFRAME Home

BIBFRAME Home

BIBFRAME  February 2013

BIBFRAME February 2013

Subject:

Re: Punctuation

From:

Jörg Prante <[log in to unmask]>

Reply-To:

Bibliographic Framework Transition Initiative Forum <[log in to unmask]>

Date:

Wed, 6 Feb 2013 00:45:23 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (50 lines)

I have also written a general purpose reader for ISO 2709 based streams 
in Java (MARC, UNIMARC, MAB etc.) I could make it robust to all 
inconsistencies that I am aware of. For example, in Germany, Pica MARC 
files contain a line feed between records, which is illegal. Or, the 
Unicode UTF-8 characters are encoded in decomposed form. With this 
reader I plan to explore Bibframe conversions in the future. It's open 
source (Affero GPL).

I wonder if there is a source of freely available representative sets of 
bibliographic records of the MARC format family that can help developers 
in quality tests? There are only a few example records in the marc4j 
source distribution I know of.

The GND was started being delivered in RDF with unescaped IRI characters 
a year ago, I reported the issue and it should have been fixed quite a 
while now. As a consequence, I wrote my own Java RDF Turtle parser that 
can even handle broken IRIs. Yes, most the RDF turtle parsers out there 
are flaky. Same holds for RDF Turtle writers.

Best regards,

Jörg

Am 05.02.13 18:48, schrieb Tom Emerson:
> Riley, Charles writes:
>> Also, too many programmers have to understand raw "marc", because too
>> much code produces broken records, and there are too many
> [...]
>
> Indeed: I've written a general purpose library for reading Z39.2 /
> ISO-2709 encoded files and it is rife with hooks and special cases to
> deal with the records we get from data providers. Supporting all the
> possible variants is a nightmare (MARC-21, UniMARC, CMARC, CNMARC,
> KORMARC, *MARC) and the inconsistencies and invalid crap I see drives me
> to distraction.
>
> Then again, RDF isn't free from that kind of thing. I've been working
> with the latest publically available GND authority file (several
> gigabytes of Turtle encoded RDF) from the DNB and I ended up having to
> globally filter out one particular predicate because the values commonly
> contained invalid IRIs that made Jena's TDB import barf.
>
>> The character used for a field delimiter on one system, ǂ, is the
>> alveolar click letter used in print in Khoesan languages, supported in
>> ISO 6438 and therefore, by extension, in UNIMARC.
>>
>> Other systems use ‡ as the field delimiter.
> Was this intentional? U+01C2 and U+2021 could be easily confused if the
> font is lame enough.

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
July 2011
June 2011

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager