Print

Print


Dear Mark,

After spending a little more time looking at RFCs this morning, I still think this is a bug as a result of faulty implementation on libxml2's part.


Section 4.1 of RFC 4452, which defines INFO URIs [1], states that 'the "info" URI syntax ... is conformant with the generic URI syntax defined in RFC 3986" [2].   I think the fact that the INFO URI scheme is its own scheme is the key factor here.  The scheme is, as you know, the part preceding the first colon.  RFC 3986 presents the following under "generic syntax"

      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]


Section 3.1 of RFC 3986 explains that "each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme ... [and] URI scheme specifications must define their own syntax so that all strings matching their scheme-specific syntax will also match the <absolute-URI> grammar, as described in Section 4.3."  Syntactically, from section 4.3, an absolute URI is defined as


absolute-URI  = scheme ":" hier-part [ "?" query ]



If I interpret all of this correctly, any URI that employs a defined scheme is ipso facto an absolute URI.



Additionally, the last paragraph of RFC 3986 section 4.2, about relative references, states that "a path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name," which would send us back to section 3.1.



Turning to the PREMIS namespace, "lc/xmlns/premis-v2" is certainly the "path" component of the URI but "info" is the scheme, and scheme + path = an absolute URI.   There's actually no authority component of the PREMIS Namespace (the regex for URIs in Appendix B of RFC 3986 confirms this).  (Namespaces in XML 1.1 defines an XML namespace as an IRI reference, which is RFC 3987, which in turn fully subsumes and extends RFC 3986.)  Also, the PREMIS namespace is independent of any other context and a relative reference would have a dependency.



Yours,

Kevin




[1] http://www.ietf.org/rfc/rfc4452.txt
[2] http://www.ietf.org/rfc/rfc3986.txt


From: Fisher, Mark L [mailto:[log in to unmask]]
Sent: Thursday, March 28, 2013 7:43 AM
To: Ford, Kevin; 'PREMIS Implementors Group Forum'
Subject: RE: Premis namespace relative URI issue and code snippit

Kevin,

Thanks for the input. You are right that this is a consequence of how libxml2 processes namespaces.

That being said, If I correctly understand "Namespaces in XML 1.1 (Second Edition)" (http://www.w3.org/TR/xml-names11/) and "RFC3986: Uniform Resource Identifier (URI): Generic Syntax" (http://tools.ietf.org/html/rfc3986), then RFC3986 section 3.2 "Authority" specifies that an URI containing an authority (which an info: URI does) must start the authority section with '//' and end it with either nothing, '/', '?', or '#'. By that definition, the info: URI for PREMIS should look like:
                info://lc/xmlns/premis-v2
rather than the current:
                info:lc/xmlns/premis-v2
as the 'lc' is the authority portion of the info: URI. So the current PREMIS namespace info: URI is a relative URI under the rules of RFC 3986, and relative namespace URIs are deprecated by the W3C's "Namespaces in XML 1.1 (Second Edition)".

I'm forced to conclude that libxml2 is taking a hard (but not unreasonable) line by deprecating the current PREMIS info: namespace URI. Fortunately, it appears (from https://mail.gnome.org/archives/xml/2012-December/msg00009.html, a message by Daniel Veillard on the libxml mailing list) that the libxml developers plan to allow relative namespace URIs in the next libxml release.

Meanwhile, I'll take a look at your workaround:
                $ourValueXml = $domNode->ownerDocument->saveXML($domNode)
to see if it lets us handle the issue until libxml2 is modified to allow relative namespace URIs.

Mark Leighton Fisher
Purdue University Research Repository
[log in to unmask]<mailto:[log in to unmask]>
317-220-3687 (cell)
765-496-1921 (office)
Skype: markleightonfisher




From: Ford, Kevin [mailto:[log in to unmask]]
Sent: Wednesday, March 27, 2013 5:55 PM
To: 'PREMIS Implementors Group Forum'; Fisher, Mark L
Subject: RE: Premis namespace relative URI issue and code snippit

Dear Amy and Mark,

Thanks for the snippet.


I suspect this is a bug in libxml2, but one (dated) report I found about it suggests it won't be fixed.   The one note I found mentioned that libxml2 URI/namespace logic conforms to RFC 2396, published in 1998 and made obsolete later by a few other RFCs, and that the error had to do with that fact.  I've not spent the time to penetrate RFC 2396, but the question would be whether the PREMIS namespace URI conforms, syntactically, to a URI as defined by RFC 2396.  I *think* it does, but that's the part I don't have the energy to fully investigate presently.   RFC 2396 certainly states that "an absolute URI contains the name of the scheme being used (<scheme>) followed by a colon (":") and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme." [1]  This makes me suspicious that the reported error correctly identifies the problem.



In any event, the PREMIS URI is not a "relative" URI.  It's an INFO URI, but INFO URIs were not formally defined until 2006.  Nevertheless, the scheme ("info") is followed by a colon, and then a string part "whose interpretation depends on the scheme," about which see RFC 4452 [2].  This is all very informative, perhaps, but not necessarily addressing your problem.

I don't know exactly what you are trying to do, but I did discover that if you replace the problem line

$ourValueXml = $domNode->C14N(false);

with

$ourValueXml = $domNode->ownerDocument->saveXML($domNode);

you can get around the error.  It'll output the XML element.  However, it appears to drop the namespace declaration, which you would have to add back in with more coding trickery if it is needed.

Hope this moves you a little closer.

Yours,
Kevin

[1] http://www.ietf.org/rfc/rfc2396.txt
[2] http://www.ietf.org/rfc/rfc4452.txt




From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Hatfield, Amy J
Sent: Wednesday, March 27, 2013 2:26 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: [PIG] Kevin Ford: Premis namespace relative URI issue and code snippit

Hi Kevin,

Yes! Here are code snippits per your request. Mark Fisher is our programmer (CC'd herein).

We look forward to hearing back from you.

Best!
Amy

From: Fisher, Mark L
Sent: Wednesday, March 27, 2013 2:16 PM
To: Hatfield, Amy J
Subject: RE: Premis namespace

Hi Amy,

Please forward this message onto Kevin. I've embedded a PHP code sample that fails with a relative PREMIS namespace, and a PHP code sample that succeeds using the kludge of an absolute (if wrong) namespace for PREMIS (a namespace similar to the standard METS namespace).



====== FAILING CODE ======
<?php

$xmlStr = '<premis:premis'
        . ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"'
        . ' xmlns:premis="info:lc/xmlns/premis-v2"'
        . ' xsi:type="premis:file"'
        . ' xsi:schemaLocation="info:lc/xmlns/premis-v2'
        . ' http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd">'
        . ' <premis:object>george</premis:object></premis:premis>';

$xml = new SimpleXMLElement($xmlStr);

$dom = dom_import_simplexml($xml);
$domList = $dom->getElementsByTagName('object');
foreach ($domList as $domNode) {
        $ourValueXml = $domNode->C14N(false);
        break;
}

echo "\nValue XML:\n$ourValueXml\n";

echo "\ndone.\n";



====== KLUDGED CODE ======
<?php

//http://www.loc.gov/standards/premis/v2/

$xmlStr = '<premis:premis'
        . ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"'
        . ' xmlns:premis="http://www.loc.gov/standards/premis/v2/"'
        . ' xsi:type="premis:file"'
        . ' xsi:schemaLocation="http://www.loc.gov/standards/premis/v2/'
        . ' http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd">'
        . ' <premis:object>george</premis:object></premis:premis>';

$xml = new SimpleXMLElement($xmlStr);

$dom = dom_import_simplexml($xml);
$domList = $dom->getElementsByTagName('object');
foreach ($domList as $domNode) {
        $ourValueXml = $domNode->C14N(false);
        break;
}

echo "\nValue XML:\n$ourValueXml\n";

echo "\ndone.\n";


====== FAILING CODE OUTPUT (NOTE MISSING OUTPUT VALUE) ======
$ php /home/php/relative-namespace-bug.php
PHP Warning:  Xdebug MUST be loaded as a Zend extension in Unknown on line 0
PHP Warning:  DOMNode::C14N(): Relative namespace UR is invalid here : info in /home/php/relative-namespace-bug.php on line 16
PHP Stack trace:
PHP   1. {main}() /home/php/relative-namespace-bug.php:0
PHP   2. DOMNode->C14N() /home/php/relative-namespace-bug.php:16
PHP Warning:  DOMNode::C14N(): Internal error : checking for relative namespaces in /home/php/relative-namespace-bug.php on line 16
PHP Stack trace:
PHP   1. {main}() /home/php/relative-namespace-bug.php:0
PHP   2. DOMNode->C14N() /home/php/relative-namespace-bug.php:16
PHP Warning:  DOMNode::C14N(): Internal error : processing docs children list in /home/php/relative-namespace-bug.php on line 16
PHP Stack trace:
PHP   1. {main}() /home/php/relative-namespace-bug.php:0
PHP   2. DOMNode->C14N() /home/php/relative-namespace-bug.php:16

Value XML:


done.


====== KLUDGED CODE OUTPUT (NOTE THAT WE GET THE premis:object XML OUTPUT) ======
$ php /home/php/relative-namespace-kludge.php
PHP Warning:  Xdebug MUST be loaded as a Zend extension in Unknown on line 0

Value XML:
<premis:object xmlns:premis="http://www.loc.gov/standards/premis/v2/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">george</premis:object>

done.


FYI to Kevin: The extraneous " PHP Warning:  Xdebug MUST be loaded as a Zend extension in Unknown on line 0" lines are due to my use of PHP on Cygwin under Windows, which does not yet support building non-standard extensions. You can safely ignore those messages.


Mark Leighton Fisher
Purdue University Research Repository
[log in to unmask]<mailto:[log in to unmask]>
317-220-3687 (cell)
765-496-1921 (office)
Skype: markleightonfisher



From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Ford, Kevin
Sent: Tuesday, March 26, 2013 6:53 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [PIG] Premis namespace

Hi Amy,

Would you be willing to share the snippet of code?  It might be helpful to take a look at it.

Cordially,
Kevin

--
Kevin Ford
Network Development and MARC Standards Office
Library of Congress
Washington, DC



From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Hatfield, Amy J
Sent: Tuesday, March 26, 2013 3:14 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [PIG] Premis namespace

Thank you, Rebecca! I look forward to hearing from you.

Best!
Amy

From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Guenther, Rebecca
Sent: Tuesday, March 26, 2013 11:07 AM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [PIG] Premis namespace

We are exploring this-- our schema developer is out for a few days but we will get an answer to you soon.

Rebecca

Rebecca Squire Guenther
Network Development & MARC Standards Office
Library of Congress
Washington, DC 20540
[log in to unmask]<mailto:[log in to unmask]>

From: PREMIS Implementors Group Forum [mailto:[log in to unmask]] On Behalf Of Hatfield, Amy J
Sent: Wednesday, March 20, 2013 4:21 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: [PIG] Premis namespace

Greetings:

Just signed up for the list. First post. Need a little help.

We have implemented a metadata model for our research repository, consisting of datasets, that includes METS (wrapper), DC (description), MODS (personal name/owner information), and PREMIS (preservation) metadata.

Our programmer is coding in PHP and is using the libxml2 library to handle the xml construct for our metadata. I recently went through the process of metadata validation and well-formed checking. We are good in that regard. However, my programmer just informed me that the libxml2 library blows up when it encounters the info:lc/xmlns/premis-v2 namespace because it is relative instead of absolute. Problem is, if I change the namespace URI, our metadata is no longer valid.

Has anyone encountered this issue? What did you do as a workaround?

Thanks for any guidance on the this issue.

Best!
Amy

Amy J. Hatfield, MLS
Assistant Professor of Library Science, Metadata Specialist
Purdue University Libraries, Digital Programs
(765) 494-6333
[log in to unmask]<mailto:[log in to unmask]>
STEW, 279