Date: Wed, 28 Dec 1994 19:56:20 -0500
Message-Id: <ab28129510021004cbd5@[199.227.1.68]>
To: uri@bunyip.com
From: hoymand@gate.net (Dirk Herr-Hoyman)
Subject: Re: Library Standards and URIs
At 6:29 PM 12/22/94, Larry Masinter wrote:
>Can you show how TEI can be used to represent the material currently
>being proposed for URC treatment, and deal with the general issue of
>representing arbitrary attributes whose identity exists in some
>external attribute registry?
>
>I think for people to consider using an SGML syntax instead of a flat
>mail header style, we'll need more worked out examples.
>
Larry asked, in a private message, for some more concrete details as to how
the TEI header could be used as the encoding for URCs.
Let me start by saying that I see 3 levels of use for URCs, in order of
increasing complexity.
1) URN to URL mapping
2) URN to URL with minimal meta-data
3) URN with full meta-data
I think we are mostly interested in levels 1+2 here, but that we'd like to
keep our options open for the free wheeling type of use that would be in
level 3. At this open level, I would see the need for something like the
TEI header, which could encode any MARC information, as well as just about
anything you might imagine. Indeed, the TEI header has been defined as a
separable scheme, something that you could do JUST for meta-data, without
actually encoding the actual resource in TEI. So, I think using the TEI
header for the URC, when you wanted to capture very rich meta-data would
work just fine.
It's at the more basic levels, and what we are really after at this point,
that it's not entirely clear how well the TEI header would perform. There
isn't anything for a URL precisely. In a recent exchange on the TEI-L, Lou
Bernard, an author of the TEI spec, admitted as much. So, with the idea of
creating an SGML syntax use in the simple sorts of URC meta-data, I offer
this example:
<urc>
<urn>urn:mysite.uri/myauth/11122233</urn>
<title>My really good resource</title>
<author>Ima Nutt</author>
<date>December 22, 1994</date>
<locationGroup>
<list>
<item><url>http://www.mysite.com/myresource</url>
<extent>24567 bytes</>
<format>text/html</>
</item>
<item><url>ftp://ftp.mysite.com/pub/myresource.txt</url>
<extent>12543 bytes</>
<format>text/plain</>
</item>
</list>
</locationGroup>
</urc>
This is a mixture of legitmate TEI syntax and some I created for this example.
To me, and considering the existance of SGML parsers such as sgmls, this is
very simple to parse. Note that even in this simple example of 2 URLs,
there is nesting. I see nesting happening in just about any URC, so I
think we need to have a syntax that handles it from the very start.
The elements of this that are TEI syntax are:
<title>
<author>
<date>
<list>
<item>
<extent>
and <locationGroup> follows the spirit of TEI tagging.
I could have constructed a pure TEI encoding, using syntax such as
<xptr id="urn:mysite.uri/myauth/11122233"> as the construct for a URN, but
that didn't seem to fit our requirement for simplicity.
My basic thought here, and I have thrown it out to the TEI-L with no
response as yet, would be to create sufficient additional elements to
satify our need for a simple URC. These could be worked into a set of
enhancements to the TEI header (for which there is already a DTD), and
provided back to the TEI community. They could choose to use it or not,
but we could then be in a position to use the TEI header as a framework for
URCs.
For a more complex example of the TEI header, here is one from the TEI
Guidelines that shows the rich structure:
For example, the article cited in this example has been published twice,
once in a journal and once in a
collection which appeared in a series:
<biblStruct>
<analytic>
<author>Thaller, Manfred</author>
<title level=a>A Draft Proposal for a Standard for the
Coding of Machine Readable Sources
</analytic>
<monogr>
<!-- In -->
<title level=j>Historical Social Research</title>
<imprint>
<biblScope type=vol>40
<date>October 1986
<biblScope type=pages>3-46
</imprint>
</monogr>
<monogr>
<!-- Rpt. in -->
<title level=m>Modelling Historical Data:
Towards a Standard for Encoding and Exchanging
Machine-Readable Texts</title>
<editor>Daniel I. Greenstein</editor>
<imprint>
<pubPlace>St. Katharinen</pubPlace>
<publisher>Max-Planck-Institut für Geschichte
In Kommission bei
Scripta Mercaturae Verlag</publisher>
<date>1991</date>
</imprint>
</monogr>
<series>
<title level=s>Halbgraue Reihe
zur Historischen Fachinformatik</title>
<respStmt><resp>Herausgegeben von</resp>
<name type=person>Manfred Thaller
<name type=org>Max-Planck-Institut für
Geschichte
</respStmt>
<title level=s>Serie A: Historische Quellenkunden
<biblScope>Band 11</biblScope>
</series>
</biblStruct>
--- end of example ---
This example also brings forth the issues of language and character sets.
In SGML and TEI in particular, there has been consideration given to this
issues, and we could piggy back on their work, if we so choose. This is
another area which I think that if we can provide for an extensible
framework for URCs, then so much the better.
-- Dirk Herr-Hoyman <hoymand@gate.net> | I tried to contain myself CyberBeach Publishing | but * Internet publishing services | I got out Lake Worth, Florida, USA | Web: http://www.gate.net/cyberbeach.html Phone: +1.407.540.8309