Print

Print


MSU response to 'channels'

We keep the notions of "channels" and "tracks" separate. We have not really
addressed the issue of "tracks" in our audio extension. We define "tracks"
as individual portions of a recording (analog or digital) prior to mixing
(mastering), while "channels" refers to a signal that is already processed,
either as simple mono/stereo, or other psychoacoustic processing, such as
HRTF (head-related transfer function). The resulting digital audio file will
have a certain number and configuration of channels (mono, stereo, 5.1,
Dolby Digital (AC-3), Dolby ProLogic, etc.). For the sake of simplicity, we
have initially decided to use just one attribute "channels" and limit its
possible values with a closed set of controlled vocabulary. We agree,
however, that it may be good to provide a finer-grain description of
"channels", though, for instance, by using descriptions such as "5.1 AC-3",
one can collapse LC channel-track-quantity and sound-field attributes.

As far as multi-track recordings are concerned, we think of them as
consisting of individual digital files. This is, indeed, how they are
represented in the digital domain. With multi-track digital recordings, we
are dealing with individual files (one for each track) and a metadata file
that groups them together (synchronizes them) and contains various
processing information, such as volume, pan, effect automation, etc. We have
not accounted for that in our proposed extension. However, we agree that
there a need to do that. The LC proposed attributes do not seem to
adequately represent the multi-track system. Perhaps, we should come up with
a new set of fields for such recordings? I can see, for instance, many
situations whereby we will have to digitize a 16-track analog recording and
digitally represent the mixing information. Have you dealt with such
recordings yet?

MSU response to 'bitrate'

We think that "variable bitrate" is a possible value of the "bitrate"
attribute, and may not require a separate field.

MSU response to "filetype" vs. "fileformat"

This is a very interesting question. Indeed, the file type can be encoded as
MIME type, though we'd much rather keep it in the audio extension. File type
is not the same as "internet_media_type". What we understand by "filetype"
is, basically the way in which data is stored. It is a broader concept than
the MIME type. It is often referred to in the literature as "file format",
hence, probably, the confusion. The label is, of course, not important, and
can be changed to something less ambiguous.

Let me give an example of what we mean by "filetype". In addition to raw
audio data (individual sample values), audio file types also contain control
data. For example, a file can contain an edit decision list with timecode
and crossfade information, as well as some processing data (e.g.
equalization). Many such types use an introductory header that contains
information such as sample rate, bitdepth, number of channels, compression,
etc. Mac files, for instance, use a two-part structure with a data fork and
a resource fork. Audio can be stored in either mode. The raw type that you
asked about, contains only audio data - no header. It is a very popular
format among audio engineers and speech scientist. Header information must
be stored in a separate metadata file.

What we mean by "fileformat" is the possible ways in which a particular file
type can be encoded. For example the AIFF type supports many formats of
compressed and uncompressed data, e.g., the AIFF-C file format is a version
of AIFF that allows for compressed data. Several different types of
compression can be used including MACE, and u-law. The WAV file type, is
very similar, as it can contain comprise different formats. For instance, a
WAV file can contain data encoded as PCM, MPEG-3, or ATRAC (minidisk
compression).

Our intention was to separate type from format in a simple, generalized way,
without having to break the distinctions down into attributes such as byte
order, header size, etc. This detailed information can be usually inferred
from the basic type/format relationship. We thought we would limit the
possible type/format pairs with controlled vocabulary.

This type/format distinction also accounts for the problem with codecs that
you mentioned. The codec is, in fact, a characteristic of file format. Your
QuickTime is a good example. We could have "QuickTime" as the file type, and
"Qualcomm PureVoice" as file format. This is NOT the finest grain
classification, but we believe it is sufficient.

Hope this helps!
(MSU group:Bartek Plichta)