Raptor RDF Syntax Library <http://librdf.org/raptor/> and the associated
'rapper' command-line utility <http://librdf.org/raptor/rapper.html> -
very useful set of tools to transform between various RDF serialisations.
On 24/04/2013 15:43, "Tom Emerson" <[log in to unmask]> wrote:
>Bernhard Eversberg writes:
>> Ah, I'm sorry, I mixed up the figures! The RDF file is the larger one.
>> Is there, by the way, any software that would convert RDF into Turtle,
>> and also change the &#nnn; entity notations into UTF-8?
>The Apache Jena rdfcat utility will to this, assuming the inputs are
>valid. With the previous GND dump there were a handful of Turtle
>statements that were invalid.
>Converting the entities to UTF-8 is a bigger issue. Again, in the
>previous release, the data often contains entities that are not part of
>the W3C's list of XML entities, including:
>- &nsb; - ISO 6630 control for NON-SORTING CHARACTER(s), BEGIN ->
>- &nse; - ISO 6630 control for NON-SORTING CHARACTER(s), END -> U+009C
>- &ptacc; - U+0323 COMBINING DOT BELOW, "punct als accent"
>It was also common for &nse; to appear in the data with the trailing ';'
>missing. There were also cases where a space was missing after an
>ampersand leading to failures when attempting to decode entities, e.g.,
>"Pietsch, Heinz Dieter &Getty-Ulan"
>> And it would be really nice to learn why there are these differences
>> instead of a stable download format. Wonder what it will be the
>> next time...
>Yes, I'd prefer Turtle of RDF/XML, since all of my tools for processing
>this expect Turtle already. :-)
>Principal Software Engineer, Search
>[log in to unmask]