URC proposal for Davenport Group

Terry Allen (terry@ora.com)
Sun, 22 Jan 1995 12:41:10 PST

Message-Id: <199501222041.MAA02978@rock>
From: Terry Allen <terry@ora.com>
Date: Sun, 22 Jan 1995 12:41:10 PST
To: davenport@ora.com, uri@bunyip.com, hackers@ora.com
Subject: URC proposal for Davenport Group

Proposal for Davenport Group work on URCs
Terry Allen 22 January 1995

INTRODUCTION

URCs, or Uniform Resource Characteristics, are being
discussed, inter alia, as a means of supplying bibliographic
metadata about online documents. At the Davenport Group
meeting of 17--19 January 1995, I proposed that the group
attempt to construct a trial URC resolution service for
online computer documentation (the area of interest of
the Davenport Group). The Davenport sponsors were quite
interested in the proposal; this document is an expansion
of what I said at the meeting, outlining my thoughts on
what is needed for such a service. Comments are more than
welcome. (And thanks to Ron Daniel and Roy Fielding for
helpful correspondance on UR issues; neither of them is
responsible for this proposal, which is not intended as an
RFC, at least not at this stage.)

This proposal is intended primarily to establish a format
for the metadata and secondarily to determine what other
pieces are required to make an URC resolution service work.
The aims stated are deliberately circumscribed so as to
avoid several large issues not directly related to the
metadata format, and no attempt is made to generalize
that format beyond computer documentation (which for the
purpose at hand I assume to be in SGML, though it doesn't
really matter).

As Ron Daniel has put forward a concrete proposal for URCs
http://www.acl.lanl.gov/URI/SGML/overview.html
that envisions them as sets of information, I use the term
"URC set." The format for the metadata is SGML, but I am
not advocating the use of SGML for this purpose in the
general case---it just so happens that for Davenport,
SGML apparently will work.

I believe the following pieces are needed:

Engine to generate a correct URC set from the Bookinfo in a
Docbook DTD-encoded document

DTD for the URC set

Engine to store URC sets, concatenate them, and permit authorized
revision of them (the info need not be stored in SGML; this
is a separate layer [or layers?] that could be implemented
differently by different services)

Server site(s) for public URC resolution (the issue of authorized
revision can be avoided, from the standpoint of this
project, if each publisher maintains its own site, although
I don't expect that to be the eventual general case)

Local URC resolution service (check to see whether the target
document is available on the local system)

Very simple URL(?) format to return answers to only 2 types of
queries: 1) given a URN, return all URLs; 2) given
URC for TITLE, return all URLs (anything more demands
that we decide upon a query language, which is a contentious
matter that is not the point of this exercise).

Some machinery for the browser to choose a URL from among
those returned

Then it has to be wrapped up and made to work so that I can write a link
in my Docbook document (Docbook has a Ulink element to hold
URLs) like this, for a URN:
... blarty foo <ulink url="the.urn.goes.here">Windows 3.1
User's Guide</ulink> blarts more blarts

or this, for a URC title query:

... blarty foo <ulink url="the.urc.for.title.goes.here">Windows 3.1
User's Guide</ulink> blarts more blarts

and when the user clicks on the hot spot, the intended document
is fetched and displayed from the local installation or from the
Internet, assuming the user is connected to it. It may be
desirable to extend <ulink> with attributes additional to
the present URL attribute.

The browser has to transmit the URN or URC to the local URC
resolution service, then if need be to the publisher's URC
resolution service site, and upon receipt of the response,
to invoke the "some machinery" to pick a URL and fetch it.

The resolution service needs to parse the complete URC set, or
consult a preparsed table, to return the appropriate info.

FORMAT of the URC INFORMATION

As the documents in question are electronic books, it seems
appropriate to use either a TEI header or the USMARC format
to represent the bibliographic metadata, as both formats
are well worked out. I chose TEI because I think it will be
easier for Davenporters to use and I didn't want to learn
USMARC rules. Here's a sample set of information marked
up in strict accordance with the TEI P3 DTD. It may be
desirable to define a subset of this DTD for Davenport
purposes; I am still exploring the possibilities offered
by the TEI header; see recent posts to TEI-L. This
set has a bit more info than is strictly needed.

<!doctype teiheader system "tei2.dtd"[
<!ENTITY % TEI.mixed 'INCLUDE' >
<!ENTITY % TEI.names.dates 'INCLUDE' >
<!entity % isonum system "iso-num.gml">
%isonum;
]>
<!-- need isonum for the ampersand in O'Reilly and Asso -->
<teiheader>
<!-- as for most elements, the attributes of teiheader are not
really needed for an elementary URC -->
<filedesc>
<titlestmt>
<title>X Window System User's Guide: electronic edition</>
<!-- TEI recommends that you distinguish the titles of print works
and electronic versions in this fashion, using one of two
set phrases, the other one being "a machine readable
transcription" -->
<author>Valerie Quercia</>
<author>Tim O'Reilly</>
</>

<editionstmt>
<edition>OSF/Motif 1.2 Edition</>
</editionstmt>

<publicationstmt>
<publisher>O'Reilly &amp; Associates, Inc.</>
<idno type=ISBN>12345678-9</>
<!-- ISBN of the electronic edition, not of the print book -->
<date>1 April 1994</>
</publicationstmt>

<seriesstmt>
<title>X Window System</>
<idno type=vol>3</>
</seriesstmt>

<sourcedesc>
<p>written as an etext
</>
</sourcedesc>

</filedesc>

<encodingdesc>
<classdecl>
<taxonomy id=LCSH>
<bibl>Library of Congress Subject Headings
</bibl>
</taxonomy>
</classdecl>
</encodingdesc>

<profiledesc>
<textclass>
<keywords scheme=LCSH>
<list>
<item>Computer software documentation</>
<item>Computer software configuration management</>
</list>
</keywords>
</textclass>
</profiledesc>
</teiheader>

THE TEIHEADER IN AN URC SET

Here's a sample document, with DTD, that wraps the above
TEI header along with URNs and URLs into one large element I
called URC. The content model of URC is arranged simply for
convenience in the present trial, and should be regarded pretty
much as a placeholder or strawman. I use <urc.etc> to
represent all the other flavors of URC sets that might
exist. The prologue includes the DTD and the pieces from
the prologue of the sample TEI header shown above; the
included entity would be the <teiheader>...</teiheader>
part of the sample above.

<!doctype urc [
<!element urc - - (urc.tei.davenport*, urc.etc*)>
<!element urc.etc - - (#pcdata) -- placeholder -- >
<!element urc.tei.davenport - - (teiheader,
((URN+, URL*) | URL+)) -- at least one URN or
at least one URL, per comments at Davenport
meeting, and thanks to Eve Maler -- >
<!element (urn|url) - - (#PCDATA)>
<!entity % isonum system "iso-num.gml">
%isonum;
<!ENTITY % TEI.general 'INCLUDE' >
<!ENTITY % TEI.names.dates 'INCLUDE' >
<!entity % teidtd system "tei2.dtd">
%teidtd;
<!entity teix system "tei.exmpl.v3m">
<!-- teix is the teiheader example shorn of its doctype decl -->
]>
<urc>
<urc.tei.davenport>
&teix;
<urn>Very.Fine.Example
</urn>
<url>http://com.com.com/very.fine
</url>
<url>http://edu.edu.edu/v.fine.example
</url>
</urc.tei.davenport>
</urc>

OPEN ISSUES

Practically everything is open, but here's a short list.

Is the list of pieces given above complete? correctly divided
into components?

What should be the syntax of the URL attribute values for
URN and URC/title? should Docbook's Ulink be extended with
additional attributes?

Can the "local resolution service" for URN>URL resolution
be so simple as an SGML entity catalogue in the style set up
by SGML Open? can the local service for simple URC/title
queries be specified so that it could be implemented *as a
layer distinct from the URC set and document encoding* in
Hytime by those interested in doing so?

Is LCSH an appropriate choice for a keyword thesaurus
(beyond the scope of the project, really, but something
to be thinking about)?

If one wishes to establish URNs for sections and subsections
of a document, how should they be nested, if at all, in
the overall URC set?

Who would be interested in helping with some of the other
pieces? There's no money in this project, at least at this
stage of development, and maybe not any glory, either.

-- 
Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
A Davenport Group sponsor.  For information on the Davenport 
  Group see ftp://ftp.ora.com/pub/davenport/README.html
	or  http://www.ora.com/davenport/README.html