Tom Morris writes:
> If you allow users to enter URLs and they do it wrong, what do you do when
> it's time to generate RDF with what is supposed to be a valid IRI?
It should never get to this point: the data should be validated. Of
course that will never happen.
> Do you throw the data away, quote it somehow so one might be able to
> fix it up later or punt and generate a dump with invalid IRIs? The
> GND wouldn't be the first to choose the last option.
Throw it away: don't distribute syntactically broken data. With all due
respect to the DNB, I should not be responsible for fixing broken
data. If I have to do it, chances are someone else has to as well.
In the scheme of things, of course, dealing with this is not a big
deal. It's just that pretty much every non-trivial data dump I've ever
received, whether it be in some MARC variant or RDF format, has been
broken in one way or the other and requires cleanup. It makes me wonder
how any of the origin systems manage to work.
As someone down in the weeds these kinds of issues are the bane of my
existance.
Peace,
-tree
|