Error during command authentication.

Error - unable to initiate communication with LISTSERV (errno=111). The server is probably not started. LISTSERV 16.0 - MODS Archives

Print

Print


Loosening the validation was required for us just 
to get the OAI harvest done. Strict validation 
could cause harvest to hang or fail and usually 
wasn't worth it for a handful of bad records. We 
tend to throw out the bad records. If there was a 
more systematic problem, we would try to contact the data provider.

Character encoding is something we should be 
striving to get right - and should be making sure 
that our vendors know how to do correctly. 
Whether the data resides in a digital library 
environment or not, it's not going to be very 
interoperable (such as being useful in on-line 
catalog!) if there are hang-ups on technical glitches.

Sarah

At 08:24 AM 10/25/2006, Jackie Shieh wrote:

>Though, I'd be a bit careful to loosening validation
>standards, as it may come back and haunts one later...
>
>In this particular case, since I am hoping to get it
>working for our online catalog, when character encoding is
>incorrect, indexing will then be faulty.  Thus, the object
>will most likely lost in the abyss, user will not be able
>to find it. Consequently defeats the purpose of providing
>it via online catalog, doesn't it?!
>
>That said, if the data is to reside only in the digital
>lib environment, perhaps, the character encoding is not
>such a big issue as it can be. (For me at this time it is...
>plus  more to look into from my original query to the list,
>i.e. the mapping of stylesheet for 130/240 and parent/child
>node for subject!)
>
>--Jackie
>
>On Wed, 25 Oct 2006, Sarah L. Shreeves wrote:
>
>>There's never any guarantee that metadata that 
>>has been harvested via the OAI Protocol is free 
>>from character encoding problems. In fact, here 
>>in our harvesting work at Illinois, we've often 
>>encountered character encoding problems, so 
>>much so that we've had to really loosen our 
>>validation procedures when harvesting. Lagoze 
>>et al also mention this issue in passing as 
>>well in their recent JCDL paper "Metadata 
>>Aggregation and "Automated Digital Libraries:" 
>>A Retrospective on the NSDL Experience".
>>
>>This is sometimes a problem when folks cut and 
>>paste from MS Word, but at times it can be the 
>>digital content management system itself that causes the problem.
>>
>>See the OAI best practices on this: 
>>http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?CharacterEncoding
>>
>>Sarah
>>
>>------------------------------------------------------------------------
>>Sarah L. Shreeves
>>Coordinator, Illinois Digital Environment for 
>>Access to Learning and Scholarship (IDEALS)
>>University of Illinois Library at Urbana-Champaign
>>Phone: 217-244-3877 or 217-233-4648
>>Email: [log in to unmask]
>>http://ideals.uiuc.edu/
>>At 07:43 AM 10/25/2006, Jackie Shieh wrote:
>>
>>>Yes, I am aware of unicode has various UTF encoding.
>>>I had to deal with converting UTF-16 to UTF-8 in order
>>>to use MARC::Record module.
>>>What I was puzzled was that how OAI harvested data
>>>can have this problem when it was declared already
>>>as utf-8... then caused my MARC::Record to have diacritics
>>>problem.  I suspect this is more complicated where I
>>>must trace back to the original supplier of the data.
>>>Then go from there.
>>>Thank very much for all your patience in this mystery!
>>>Regards,
>>>--Jackie
>>>On Tue, 24 Oct 2006, Erik Hetzner wrote:
>>>
>>>>At Tue, 24 Oct 2006 14:55:18 -0400,
>>>>Jackie Shieh <[log in to unmask]> wrote:
>>>>>I am fairly new on MODS and MARC21 conversion,
>>>>>so my question perhaps too elementary...
>>>>>If declaring output to ascii, don't I then miss
>>>>>the proper diacritics encoding?!  The records
>>>>>I have are primary non-English.
>>>>Character encoding is simple in concept but complex in execution. I am
>>>>not an expert but I will do my best.
>>>>The UTF codepoint for LATIN SMALL E WITH ACUTE (é; if you do not see
>>>>an e with an acute accent your (or possibly my) mail reader is not
>>>>working propertly) is U+00E9 (see
>>>><http://www.fileformat.info/info/unicode/char/00e9/). 
>>>>As UTF-8 this is> expressed by the two bytes 
>>>>0xC3 0xA9 (see above message). If your
>>>>document is encoded as UTF-8 then those two bytes will make the
>>>>character above. If you are looking at the file as latin-1 encoding
>>>>these bytes will not look like this é but instead like é. If you set
>>>>the encoding of your output file to ascii it will “entity encode” your
>>>>character as &#233; (decimal) or &#xe9; (hex). If you are processing
>>>>this xml with a useful parser it does not care if you have: (a) é in
>>>>utf-8; or (b) &#233; or &#x39; as entity encoded characters. But it
>>>>you wish to force your XSL transform to output entity encoded ascii
>>>>rather than UTF-8 you must set you encoding to “ascii” in your
>>>><xsl:output> element. This means that the file itself is 7-bit ascii
>>>>but all the characters outside of those 7-bits will be encoded as pure
>>>>ascii which is equivalent as far as XML parsers are concerned.
>>>>best,
>>>>--
>>>>Erik Hetzner
>>>>California Digital Library
>>>>510-987-0884
>>>
>>>-----------------------------------------------------------------------------------------------
>>>Sarah L. Shreeves
>>>Coordinator, Illinois Digital Environment for 
>>>Access to Learning and Scholarship (IDEALS)
>>>University of Illinois Library at Urbana-Champaign
>>>Phone: 217-244-3877 or 217-233-4648
>>>Email: [log in to unmask]
>>>http://ideals.uiuc.edu/