Message-Id: <9305161659.AA24735@interval.interval.com>
Date: Sun, 16 May 1993 09:59:40 -0800
To: Ed Krol <krol@ux1.cso.uiuc.edu>
From: Terry Winograd <winograd@interval.com>
Subject: Re: Internet Draft on URNs
>Date: Sun, 16 May 1993 09:04:38 -0500
>From: Ed Krol <krol@ux1.cso.uiuc.edu>
>To: clw@merit.edu, marca@ncsa.uiuc.edu
>Subject: Re: Internet Draft on URNs
>
>URN's need not be human readable, but they need to be human
>recognizable as a particular resource. ...
>In the same manner I think that by making the URN a bit larger
>than required by information theory we can make it meaningful
>via inspection. For example, one could say the the URN for
>the Vint Cerf article in Scientific American last year could
>be its ISSN and the ordinal of the article in the magazine.
>So 1-56592-025-2.4 or something would be its URN, not very
>appetizing. On the other hand you could have something like
>cerf.sciamer.1992.4.4. (This made up example was author,
>magazine abbreviation, year, month, ordinal). A bit bigger
>to impart some instant information, but still unique.
This argument is intuitively right but it loses the important distinctions
that were made in separating URI, URL and URN and keeping each of them
faithful to its particular function. There are three distinct kinds of
information that are useful in citations:
1) Where can I get it? (URL)
2) How do I know if it's a particular item? (URN)
3) What is useful to know about it? (URI)
In print media there is a long tradition of using things like footnotes,
bibliographies, etc. to include all these kinds of information in a
human-readable document. Conventions vary as to whether the string
appearing in the text is something like "Sally Jones, Everything You Always
Wanted to Know About RFC822, 1985" or "Jones, 1985" or "[Jo85]" or "^4"
(that's a superscripted footnote or endnote number). In the latter cases
it is assumed that the document follows a standard way of linking the
reference to a fuller citation.
Given the much more effective means we have for linking things in
electronic media, it seems that this model can work and give the
flexibility that is lacking in trying to go with any one of the information
types above. We can have a general (but potentially long and cumbersome)
citation form which includes as special cases short forms with limited
information. In many cases, a long form will be provided along with some
kind of cross-reference to make the text containing a citation more
human-readable, or to share information across references.
--------
Definition: A GENERAL CITATION is a form containing ONE OR MORE of the
following:
1. ONE OR MORE URLs. Note that it may be useful to include
multiple alternatives, such as location in a local cache and another
location in a costly but stable and secure archive.
2. ONE OR MORE URNs, with at most one from a single naming authority.
A single item may be given unique names by more than one authority,
such as ISBN and Library of Congress. This means that two distinct
URNs may in fact refer to the same item, if they are from different
authorities.
3. ONE OR MORE further descriptors (URIs) designed to convey
information about the entity (e.g., title, author, date, etc.).
Standards for URIs will be different for different
kinds of information objects. For each kind of
object there will be descriptors which are typically good at
distinguishing them (e.g., book titles), and others that aren't
(e.g., "The book with 455 pages published by Academic Press in
1987.") but are still potentially useful. Appropriateness will
depend both on whether the information is informative to a person
and how it can be applied in further use of the object or
derivation of
information about it.
--------
The standard for general citations should allow a citation to include any
mixture of these three kinds of description, including redundancies (more
than one URL, URN or URI). No one of them is required (i.e., a citation
might have a URN with no location information, or just a URL, etc.).
Depending on what you want to do with the citation you will need to go
through auxiliary indices and mappings (as is now the case with print
media).
Choices as to whether to try to be as succinct as possible (e..g, provide
only a URN) or as robust as possible (lots of URL hints, further URI
descriptors, etc. for use both by the person reading the citation and
programs using it) will be up to the specific producer and the intended
uses. There can be "specialized standards" with respect to certain
lookup mechanisms that require one or more of the above types, insist on a
particular naming authority, etc. But it should be assumed that these will
be nowhere near "universal." They may have the same kind of universality
as current publication citation formats (e.g., APA standard or CACM
standard), which are widely followed by convention within a particular
community.
--t
--------------------------------------------
Terry Winograd
1992-93 Academic Year address:
Interval Research winograd@interval.com
1801 Page Mill Road 415/354-0854
Palo Alto, CA 94304 Fax: 415/354-0872
Long-term address:
Stanford University winograd@cs.stanford.edu
Computer Science Dept. 415/723-2780
Stanford, CA 95305-2140 Fax: 415/724-7411