LISTSERV mailing list manager LISTSERV 16.0

Help for ARSCLIST Archives


ARSCLIST Archives

ARSCLIST Archives


ARSCLIST@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

ARSCLIST Home

ARSCLIST Home

ARSCLIST  January 2008

ARSCLIST January 2008

Subject:

Re: jazz discography

From:

Jon Noring <[log in to unmask]>

Reply-To:

Jon Noring <[log in to unmask]>

Date:

Wed, 30 Jan 2008 21:49:26 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (202 lines)

Steven Barr wrote and asked:

> 1) In some cases, data can be obtained from still-extant recording ledgers.
> Note that these ledgers (except for many Victor items) generally do NOT
> provide session personnel data...often including vocalists. However,
> they DO provide date and location of the recordings they document.
> In part, this answers your question immediately above; the relevant
> Brunswick ledgers DO exist, so matrix numbers can be entered by looking
> for the sheet listing the title in question (except that if more than
> one take was recorded, there is no reliable way of knowing which
> take was issued...!). Where ledgers no longer exist (virtually all
> minor/"indie" labels of the twenties) there is no way of knowing
> (accurately, anyway) matrix numbers, dates or other session-
> related data...and "educated best guesses," presumably prefixed
> with "c." or "est." or equivalents...and/or data from common
> sources (ADBD/CED/usw.) will have to suffice by default...!

Well, my present view is that we gather session-related data from
anywhere we can find it, and provide for including duplicate data
which may conflict with each other. The "metadata" for the source of
every "data set" will be recorded, and an estimate of the level of
"authority" could also be noted by another authority (e.g., ARSC).
Users of the discographical information can filter out sources
they don't believe are reliable, or will have several sources which
they can view and weigh against each other.

In many ways this is thinking outside the box, since the long-held
paradigm is that someone just has to pick the information which is
"most correct" and run with that and exclude the other information.
We should chuck that paradigm and simply record it all, so long as
we properly identify the source of information, and let the end-user
decide what to use based on the authority metadata.

*****

At this point, let me note again that in my view the "discographical
database" will comprise (at least) two parts, which will be independent
of each other, other than identifier linkage at the recording master
level [note]:

1) The recording artifact information, and

2) Session information.

[note, when the artifact does not provide its own master recording
identifier, commonly referred to as "matrix number", we simply apply
our own unique identifier to that field, so that it can point over to
the session information if any is known. UUID is one candidate unique
identifier that I would consider using, although its length may scare
some people.]

Regarding 1, we record exactly what the recording artifact tells us,
both written and physical. I'm not certain we even want to include
"normalized" data of any kind in the artifact data, but if we do, that
data would not replace what the record tells us, but would be added as
"normalized" data, essentially in parallel to the artifact data. (If
there are misspellings or mistakes in the textual information the
artifact gives, we transcribe that text exactly "as is". We don't care
anything except to get the text transcribed accurately to what is
shown on the artifact. If the label is given as "Colombia", rather
than "Columbia", and it is a Columbia, we record the labelname as
"Colombia".)

#2 is the real meat of discography since that's where we include
session data, such as location, date, musicians, the known mastered
recordings, etc. This is where interpolation is allowed, and we'd
probably even see attempts at normalization, authority assignment,
etc.

(I think that normalized "bios" of musicians, and a normalized
song/composition database, be separate, again with identifier linkages
from the session data. For example, Session data from ledgers may list
Benny Goodman as "Benny Goodman", "Benjamin Goodman", "Bennie Goodman"
and "Shoeless Joe Jackson". And if that's all the information we have
from the ledgers, that's what we use. So in order to tie these different
variations to the same person, some authority in the future may connect
them with a common and unique identifier (and here we even allow
multiple authorities.) Once we have a common identifier, then that can
be used to create a biographical sketch for the performer in a
separate XML document designed for that purpose.)

Anyway, just some of my thoughts...


> 2) Any discographic entity of whatever sort MUST list both the
> extant and actual information in cases where both exist (and are
> known to the compiler[s]). In some cases, what would appear to be
> an error actually is not; for example, the initial Brunswick
> recordings of "My Blue Heaven" are labelled as "Blue Heaven"...
> and play very slightly different lyrics (..."When the whippoorwills
> ARE calling...") which suggests they are actually the original
> versions of the tune...! I have always used two separate fields
> ("ARTCRED" and "ACTART") to track recordings issued under
> pseudonyms or those with credit errors. This, in turn, means
> I can query the database both for "Recordings on which Arthur
> Fields sings" and "Recordings on which the vocalist is credited
> as 'Mr. X'"...two entirely different questions! In fact, I can
> even query for "All recordings on which 'Arthur Fields' is
> credited as 'Mr. X'" should I need that specific data...!

The song/composition aspect of sound recordings can get complicated
due to derivatives/variants as Steve noted. The problem is that I'm
not sure the Session information should provide "normalized" song
title stuff, since that should be done in a database for that purpose.
The problem is determining the canonical or normalized title of a song
composition. Oftentimes a song composition and lyrics will vary from
the "canonical" version, but yet have no indication in title and
composer credits that it varies some. The ledgers themselves, if they
exist, may get the song title wrong. If we have no ledgers which list
the song title, then the song title is pretty much given in the
artifacts that still exist -- and even here we can have variants on
the song title from label to label when the recording is issued on
several labels. Geez, it gets messy...

And of course we have the wonderful complication of medleys. No doubt
the seasoned discographers here can think up several more exceptions
we have to deal with when it comes to song titles. It is one of the
messier aspects of discography. (The next is musician info, but that's
nowhere near as messy as song compositions and lyrics.)


> 3) IMO, the "wiki-db" should provide either (A) ALL available
> discographic data relevant to a phonorecord (or side thereof)
> with actual verified data items noted as such and "best guess"
> entries likewise identified...OR (B) enough information to
> identify a given phonorecord, along with (hyper?)links to
> other relevant data thereon. It should also be possible to
> query the database on any of its fields (including related
> data tables in the database) and receive a list of all
> phonorecords (including "None" if that is the case) which
> fit the query's declared criteria. Regardless of how the
> tables are set up, the results will be the same...the only
> difference being in how many different tables the data is
> stored! Note that my first discographic catalog database
> was NOT relational, which often resulted in a large number
> of empty data fields (which, in xBase, use as much space
> as completed fields...!); however, in these days of 1TB
> (and larger?) consumer hard drives, this is no longer a
> consideration...or so I am told...?!

We have to separate the database from the application. This is why the
data must be stored (in a source sense) in XML. XML is portable,
standardized, UTF-8 text encoded, and both human and machine readable.
Certainly a specific application may import the XML and convert it to
some internal form for fast processing/access, but we must NOT get into
the mode where our discographical data is archived and transported in
some proprietary, machine-readable-only database format. BAD.

[Now to really show my XML markup wonk side: ARSC should set a strict
policy that the master discographical information be contained in an
XML document (or documents) which is valid to the DTD controlled and
maintained by ARSC. Furthermore, "internal subsets" are not allowed or
ARSC will get very pissed. This is one step to make it much harder for
someone to "proprietize" the XML documents for proprietary advantage --
if someone just gotta have something new, they come to the ARSC committee
overseeing the DTD and nicely ask for the DTD to be expanded. There
also have to be controls on the use of other namespaces. And if it ends
up that there are requirements which cannot be completely enforced by a
DTD or Schema, then ARSC will write a script to verify the XML
documents conform to the other requirements. I speak from first-hand
experience having co-authored open standard XML-based e-book formats
since 1999 for IDPF, where we had to take seriously the possibility of
a company hijacking the spec by adding proprietary stuff for their
advantage. Likewise, ARSC has to take firm control of the whole spec
or it will get away and we'll end up again with a Tower of eBabel. And
it should be clear by now that ARSC should not agree to bless a
proprietary database format for mastering the discographical information
-- in my opinion it must be mastered as UTF-8 XML document(s) valid to
the published, open standard ARSC-maintained DTD/Schema. This assures
true internationalization and repurposeability of the discographical
information into the very distant future...]

Anyway, so long as our ontology expressed in our XML DTD/Schema is
complete, then that will enable applications which access the data to
do whatever it wants. It's simply a matter of developer time to get
all the data visualization bells and whistles users desire. If a
particular application is insufficient, that's the "fault" of the
developer, not our database. Let a thousand flowers bloom! (As Mao
said -- here I refer to applications using the XML discographical
database. Let them compete with each other.)


> 4) Are you suggesting that "songs" and "compositions" be
> kept in separate (but relationally connected) tables?
> Likewise, what are you referring to as "normalization?"
> (the word has a specific meaning in the database "industry")

To answer your first question, yes, I'm leaning this way. Part of the
reason is that this is the way it should be done since, like people,
song melodies/compositions/lyrics are really standalone entities that
exist apart from the Session and the Artifact, and have their own
richness best expressed in a separate ontology.

And about "normalization," you are right that I probably did not use
the term properly. Among librarian catalogers there is a term used to
describe "normalizing" or "standardizing" values, such as author names
-- but for the life of me I can't remember what that term is. I'm sure
several here will be able to provide the more accurate terminology from
the cataloging world, and I await for my memory to be jogged. <laugh/>

Jon Noring

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager