LISTSERV mailing list manager LISTSERV 16.0

Help for UNICODE-MARC Archives


UNICODE-MARC Archives

UNICODE-MARC Archives


[email protected]


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UNICODE-MARC Home

UNICODE-MARC Home

UNICODE-MARC  December 2006

UNICODE-MARC December 2006

Subject:

Re: What to add (2)

From:

Joan Aliprand <[log in to unmask]>

Reply-To:

UNICODE-MARC Discussion List <[log in to unmask]>

Date:

Thu, 7 Dec 2006 01:28:43 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (96 lines)

On Wed, 6 Dec 2006 18:02:09 EST, James Agenbroad <[log in to unmask]> wrote:

[This is such a long post that I am replying to only the last portion here.]

>More specific comments on responses follow:
>
>1. Joan wrote, "It is pointless to try to devise a MARC-specific list of
>characters that are allow or forbidden. Users will enter whatever is 
>available to them. I diaagree as above. Not all rules are always followed, 
>but awareness of rules reduces the frequency and consequences of proscribed 
>activities. Most  of us have jaywalked at one time or another. Recently an 
>11-year old in my town died doing so. At ALA in San Diego a policeman  gave 
>me warning for doing so. We can reasonably expect that system developers  
>will not make available characters defined as undesirable. For exmple, the  
>current MARC Specifications say that U+00IB, the excape character is 
>unkliekly to occur in UCS/Unicode records--and with good reason, Unicode 
>was designed to prevent the need for escape sequences.

The MARC 21 Specifications mention the control code U+001B because the Baisc
Latin (ASCII) mapping table gives it as the Unicode equivalent for the ASCII
C0 control character used in MARC-8 data. Although the use of escape
sequences in the context of Unicode does not make sense, the MARC 21
Specifications do not explicitly forbid use of U+001B.

My point is that there must be solid reasons for any prohibitions. (As for
the alternative of explicitly allowed characters, the scope of Unicode is
too vast.) With respect to U+001B, the MARC 21 Specifications do the right
thing: there is no requirement to remove U+001B in the unlikely event that
it occurs in a MARC record.

>2. Joan wrote, "Both the fill character and numeric character references  
>are equivalents for characters in the source record." The fill character  
>is just a place marker, it's far from the equivalent of the missing  
>character. A series of fill characters is far from the equivalent of a Thai  
>script title. Numeric character references are intended to allow the record  
>recipients to recreate the missing character(s) later when they get  around 
>to converting to Unicode; I would not expect these references to be 
>dispalyed for the public so they are not equivalents either in any useful
>of the word equivalent.

OK, here is a more explicit re-write:
Both the fill character and the numeric character reference shows the
location of a character in the source record that could not be converted to
a MARC-8 equivalent.
  
The relevant user community, MicroLIF, preferred to either drop the
unmappable character or use the fill character. The NCR option was to meet
OCLC's need for a lossless solution.
http://www.loc.gov/marc/marbi/minutes/mw-06.html

>3. Joan wrote: "Yes, use of 'a' in leader position is preferable to use of
>the Byte Order MARK." I agree and would exclude BOM from MARC  records.  
>Then she says, "A system encountering the BOM sequence could not tell 
>whether the first byte EF represented the ANSEL character candrabindu  or 
>was the first byte of a BOM." Isn't the ANSEL code EF only used in MARC-8
>records? Am I missing something? I would hope the BOM was not converted 
>unaltered into MARC-8 records.

I was writing about examination of UTF-8 data by a MARC-8 system, but
omitted "MARC-8". A system that was UTF-8 aware would, of course, interpret
hex EF as the beginning of a UTF-8 sequence.

>4. Joan wrote, "Since MARBI has already approved use of certain  private 
>use code points in MARC 21 record, there seems no good reason to  expressly
>prohibit the use of any otherprivate use code points."  That  MARBI has 
>approved use of a few PUA  characters seems a very good reason to  prohibit 
>the use of the rest--al but the approved 61 characters. Do we want MARC  
>systems to need to seek every corporate logo with a PUA code that might 
>through  error get into a MARC record? I think not.

I don't understamd how a MARC system would need to seek for the meaning of a
private use code that was not one of those sanctioned by MARBI. Such a
private use code that through error got into a MARC record would just show
up as the "no glyph available" image supplied by system software (in the
case of Windows, for example, as a box).

We should bear in mind that MARC 21 is an international format. In the
implementation of MARC 21 that we are familar with, we do not plan to use
any more private use code points. But we do not know whether implementers
using MARC 21 in other parts of the world may wish to use private use code
points to meet the needs of their own constituency. It is more flexible to
have the restriction on use controlled via MARBI approval, than to block
further use anywhere absolutely.

(There is no guarantee that a MARC 21 implementation will conform to what LC
does. Even eight-bit MARC 21 allows the use of character sets other than the
MARC-8 ones. Because of this flexibility, MARC 21 is a truly international
format.)

>5. Joan wrote, "Noncharacters by definition may not be interchanged. They
>may be used internally by an application but cannot be exported in MARC 21
>exchange." I agree we should exclude them. See item 5  above.
>As always comments are most welcome.

-- Joan Aliprand

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2018
February 2016
September 2013
March 2013
September 2008
December 2007
October 2007
September 2007
August 2007
July 2007
June 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager