Uniform Resource Names - almost draft

Mitra (mitra@path.net)
Tue, 11 Oct 94 12:28:20 -0400

Date: Tue, 11 Oct 94 12:28:20 -0400
Message-Id: <aabf78de070210046df8@[192.190.111.98]>
To: uri@bunyip.com
From: mitra@path.net (Mitra)
Subject: Uniform Resource Names - almost draft

Uniform Resource Names

Here is the latest draft of my proposal for URNs. It was intended to be a
joint authored version by Chris Weider and myself, but since Chris hasn't
been able to get to it yet,it has has all kinds of places where I'm asking
him to fill in the gaps .... Hopefully we'll have a complete version
together by San Jose.

The formatting got massacred in the conversion - a formatted version is
available from <http://www.path.net/mitra/urn.html>

- Mitra <mitra@path.net>

Preamble
Here goes the standard IETF I-D header

Purpose
This document defines a syntax for URN's, and operational rules for their
assignment and usage. The intent is to provide enough information for
implementors of IIIR systems to use these in their work.

Syntax
The URN consists of three parts, its header "URN", a publisher ID, and an
opaque string. The publisher ID is assigned by a distributed process
defined below, the opaque string is assigned by the owner of the publisher
ID, subject to the rules defined below.

A typical URN might look like this:

<URN:path.net:mitra1234>

Case is not significant. White space is not significant within a URN, this
includes the characters Space, Tab, CR, LF and hyphen. Since a URN may
contain carriage returns (for example inserted by publishers) it must
always be delimited, how it is delimited is defined by the context, for
example it might be part of another data structure, in free text it is
delimited by "<" and ">".

The BNF for the URNs is ....

URNinText ::= "<" URN ">"
URN ::= Urntag ":" PublisherId ":" OpaqueString
Urntag ::= "U"|"u" "R"|"r" "N"|"n"
PublisherId ::= FQDN | RegisteredString
FQDN ::= xasciis
RegisteredString ::= xasciis
OpaqueString ::= xasciis
xasciis ::= xascii [ xasciis ]
xascii ::= "A" ... "Z" "a" ... "z" "0" .. "9" "."

Equality of URNs
The rules under syntax imply that to compare two URNs for equality, the
following procedure, or an equivalent, should be followed.
Canonisize each URN by, stripping all space, tab, CR, LF and hyphen
characters, and converting all A..Z to a..z
Do a string compare on the results.

Two important things to note about this,
a) this compares the URN's, not the bytes they might point to. Because a
publisher gets to decide on what changes require the allocation of a URN,
two documents may have the same URN but a different series of bytes.
b) to compare two URNs, there is no need for a net access.

Rules for allocating publisher ids
It is crucial to the scalability of this scheme that publisher ids are
allocated in a distributed fashion. The best model we have for this is the
existing DNS, and using it gives us some other benefits. Therefore, it is
proposed that the owner of a network FQDN automatically has the right to
use of that string (and descendants of it) as a publisher ID, subject to
the following constraints.
1) This FQDN has not previously been used as a publisher ID, this only
constrains those cases where a FQDN is being reused (see below), or in the
case of conflict with the small number of grandfathered exceptions.
2) A resolution service is made available conforming to the rules below.
For example in order to use the publisher ID "path.net" I would need to
ensure that there was a resolution service at "path.net.uri".

Note that the term "network FQDN" (which may not be the correct term) is
used, because it is not intended that every machine be responsible for
allocating URNs, for example a suitable publisher id might be
"berkeley.edu" not "violet.berkeley.edu". It is anticipated that the
owner of a publisher ID will allocate sub-ids in a manner that makes sense
for its organisation, for example physics.berkeley.edu.

Rules for allocating URNs
URNs, or rather the opaque string portion, are allocated by the owner of
the publisher ID, in any form they wish - subject to the character set
constraints defined above.
The publisher decides what changes require allocation of a new URN - and
valid choices may range from requiring a new URN for any change in the
bytes, to assigning a single URN for all versions of a work in all
languages.

URN to URL or URC resolution
In order to facilitate the integration of the URN into IIIR architectures,
the owner of the PublisherId foo.bar must arrange for a URN resolution
service to resolve these documents answering on a ToBeDecided port on
foo.bar.uri, this URN resolution service will support at a minimum a subset
of whois++ as defined below. While it is hoped that many resolution servers
will provide more comprehensive services, these are the minimum
requirements to enable clients to do URN resolution quickly and
efficiently.

An important note, is that this doesnt define that every URN will always be
resolvable - in fact, their may not be a net-accessable version of the URN.
What it defines is that a reasonable client can take a single deterministic
action involving a single DNS access, and a single net transaction, and
know that for the vast majority of cases if the URN is resolvable it will
have been resolved.

This also doesnt say that every publisher must run a URN->URL resolution
service, this is the main reason for choosing "path.net.uri" rather than
"uri.path.net" since it allows for the service to be at a seperate
location, operated by some other entity.

This also doesnt require the client to always use this resolution service,
it is anticipated that gateways would exist from each of the major
protocols to the resolution service, and also that some sites will want to
run proxy servers which all local clients ask first for the URLs.
insert limited whois++ stuff here
Chris, I need your help here, to define the MINIMUM whois++ required to
submit a URN and get back either a list of URLs or URCs, no languages, no
constraints, etc. It does need redirection.
Grandfathering in of existing numbering schemes
It is intended to grandfather in existing schemes to the extent possible,
without constraining an optimal solution for the future.

ISBN
ISBN's consist of a string of digits which look opaque to the user, but
actally contain two distinct parts, the first is the publisher. So assuming
an ISBN of 1234567890 where 12345 is the publisher id, the URN would be
<urn:12345.isbn:67890>. To integrate this so that clients could still
resolve URNs would involve publishers running resolvers at, for example
12345.isbn.uri. Since most users are not going to know where the
publisher/identifier boundary is in an ISBN then its going to probably need
a resolver at isbn.uri that just redirects queries to the correct resolver,
so that a URN of <urn:isbn:1234567890> would be correctly resolved.
(Chris if you understand this, please replace with a better example!)

ISSN
Chris - I dont understand how ISSN's work, if you do, then please write
this part. I presume its similar to ISBNs above.

Anonymous FTP archives
There is a desire to be able to retrofit the existing FTP archives so that,
resource location tools such as archie could better de-duplicate things.
However - since there is no obvious publisher-id in this case, its probably
better to do de-duplication via checksums. Alternatively, archie could
retroactively publish all of the files with something like
<urn:archie:123456790> where the opaque string would be an MD5 checksum,
however its hard to see what this gains?

Integration with existing IIIR schemes
This section attempts to address issues around integration into existing
technologies.

Gopher
Gopher+ could easily be extended to return URNs along with the URLs it
already returns.
Another way to integrate URNs would be for the menu to return a URN in a
similar way to the handling of gateways, i.e. Path=urn:path.net:12345 and
points the Host and Port at a known gateway. The gateway would then receive
"urn:path.net:12345" as a selector string, and would perform the URN -> URL
resolution returning the results as either a Gopher0 or Gopher+ menu.

WWW
The most obvious way to handle this in WWW is to replace the anchor with
the URN, i.e. <href=urn:path.net:12345>, most clients can redirect certain
URL schemes to certain gateways, and the URN looks (to the client) like a
valid URL. The gateway would receive a HTTP request of the form "GET
urn:path.net:12345" and would respond with either a HTML page with the
URC's, or the file itself.

WAIS
Its unclear how to best integrate URNs into WAIS. The first step would be
to replace the URL returned in the QueryResponse, with a URN. Since this is
just an opaque string which is always returned to the same WAIS server,
that WAIS server could then return the appropriate document. Unfortunately
it doesnt appear that there is any way withing the current WAIS protocol
for the server to offer the client a choice of variants, however a smart
(i.e. new) client could take the URN and either do its own URN->URC lookup,
or potentially could pass the URN as the search term of a WAIS query to a
gateway that would return a QueryResponse consisting of the URLs for this
URN.

Problems and their solutions

LIFNs
Keith's requirement for an identifier independant of location, but
identifying only one particular combination of bytes is met by a simple URC
consisting of the URN as defined above and a checksum.

What happens when a FQDN gets reused?
If a FQDN has never been used as the publisher ID of URNs then reuse poses
no problem. However if the FQDN has been used as a publisher ID then there
are two possible solutions.
a) The new owner agrees to continue to resolve or arrange for resolution of
the old URNs, or to redirect queries for those URNs to the previous owners
resolution service.
b) The new owner applies for a new publisher ID, either a totally new FQDN,
or a subid of the previous FQDN that has not been used by the old owner.

Reliance on DNS
This scheme specifies DNS as the method for obtaining the location of a
resolution service. However there are concerns that this scheme should be
able to outlast DNS. There are two areas of issue - existing URNs and new
URNs:
Firstly the abscence of DNS does not invalidate the syntax as a scheme for
uniquely identifying documents. However - at such a time, a new RFC would
be required specifying an alternative means of resolution - since DNS going
away would require rewriting almost every internet application, defining
and then recoding, for a new resolution scheme is going to be small in
comparisom.
Secondly - abscence of DNS does not mean that FQDNs will go away, so these
(or any successor specifying a unique name) still remain as a valid way of
assigning publisher ids.

Open questions, and unsolved problems

Character set for xascii
What characters should be included in xascii, either because they are in
existing FQDN's, or because they are wanted for the Opaque String part.

.uri availability and port assignement
It is unclear if it is possible (politically!) to create a new ".uri"
domain, failing this an alternative such as "uri.int" or "uri.net" is just
as feasable technically. Since this string is not displayed in the URN,
there is no real concern as to which is chosen, nor does the choice affect
any of the other issues above.
A port will need to be assigned for this, preferably a non-secure, reserved
port (i.e. above 1000) to avoid the need for root access to run a
resolution server. It is recommended that this NOT be the port for Whois++
since the evolution of this service may be independant of that protocol's,
and many sites are going to want to run directory services on that port.

How it addresses the requirements document
This section still needs writing, I dont have the requirements in front of
me at the moment.

=======================================================================
Mitra mitra@path.net
Internet Consulting (415)488-0944
<http://www.path.net/mitra> fax (415)488-0988