Re: Grainularity of URN

Peter Deutsch (peterd@bunyip.com)
Fri, 21 May 1993 18:21:58 -0400

Message-Id: <9305212222.AA01417@expresso.bunyip.com>
From: Peter Deutsch <peterd@bunyip.com>
Date: Fri, 21 May 1993 18:21:58 -0400
In-Reply-To: David G. Durand's message as of May 21, 11:17
To: David G. Durand <dgd@cs.bu.edu>, uri@bunyip.com
Subject: Re: Grainularity of URN

g'day all,

I've been trying to resist diving in here, since I'm so
busy and of course each posting just seems to generate
more postings ;^) but I just want to throw in a couple of
comments here.

[ David wrote: ]

> >Dirk Herr-Hoyman wrote:
> >> But, it's not clear to me how an anchor fits into URNs?
> >
> >Well, a URN is supposed to be a name that can be turned into a URL when
> >necessary (via some kind of name lookup service), and if you believe (I
> >don't) that named anchors in URLs are a sufficient way to refer to portions of
> >a document, then I suppose the trick is just getting another URN assigned to
> >the portion of the document you are interested in.
> >
> >But it is not really that simple. Even with URLs to HTML documents, you
> >cannot refer to a portion of document unless the author has put in a named
> >anchor where you want it. And there is no way to refer to a segment or
> >range of a document, rather than a single point. Using character positions
> >is not robust across end-of-line conversions for ASCII files, much less
> >across versions of a document or conversions into different formats.
> >some stuff deleted

I think there is still a lot of disagreement over the
whole issue of defining a mechanism for referencing
portions of a resource, fragment specifiers and so on.
Because of this, if nothing else, I'd caution against
trying to work them into our current proposals (and in
particular into the proposals for URLs and URNs). There
will be time enough to work out the details of these.
They can certainly be implemented as a form of composite
"URx:FragSpec" beastie, if that's the way people want to
see them.

It is my own particular view that it is extremely
important that we enumerate the various things we are
trying to accomplish and work on the various components
separately. Among other things, this allows us to make
more rapid progress on the non-controversial components,
while still chewing, fighting and screaming over more the
controversial parts.

The list of things we want currently includes (and is
certainly not limited to) providing uniform mechanisms for:

- addressing objects,
- naming objects,
- specifying fragments of objects
- specifying clusters of objects
- specifying alternative representations of objects
- providing various types of meta-information about an object
(copyright info, author/publisher/editor info, etcetc)

There are also a few ground rules we need to keep in mind.
In particular, we must strive to eliminate assumptions
about specific systems or short-term needs (eg. just
because we're currently mostly dealing with files doesn't
mean we will be dealing mostly with files in the future).

This can be hard, in particular when we have a specific
application all ready to go and are itching to start
coding. In such cases the tendency is to confuse
implementation with architecture and take unwarranted
shortcuts to get something going but we should keep in
mind that what we're striving for here is a set of
generalizations which should work across platforms, info
systems, implementations and so on.

The problem, of course, is that until we've actually used
something for a while we'll see neither it's shortcomings
or its possibilities for further generalization. This
argues for the KISS principle, especially in the first
pass.

Returning to the particular issue of fragment specifiers,
just because an individual access method may support
fragment specifiers doesn't mean that either the general
URL or URN must support them, too. In those cases where an
access method supports such fragment specifiers it's
entirely appropriate to allow for it in the corresponding
URL encoding, but this is a far cry from saying that we
must come up with a way of specifying a generalized
mechanism for specifying fragments within all URLs at this
point. There _may_ be a generalized need for them, but I
claim that this remains to be proven and we shouldn't hold
things up while this is verified one way or the other.

I think we should agree that the URL is _only_ a mechanism
for providing general access method and identifier
information to allow us to specify locations of specific
resources. If that's _all_ it does, we're still going to
be miles ahead of where we are now. Those access methods
that support fragment specification information now may go
ahead and encode them into their URL scheme, but there
is no onus on all systems to support the concept at this
point.

Similarly for URNs. I claim that the goal of a URN is to
uniquely identify, within a particular naming space, a
particular resource object. this is a clearly definable
need and I think we should concentrate on providing a
simple, extensible and easily deployable mechanism for
doing this. The question of providing "fragments of
objects" should be punted. The question of whether we
should impose hierarchal naming structure onto all name
spaces that will be using URNs should be punted. The
question of whether we should impose checksum mechanisms
onto all URNs should be punted. And so on.

Now, this is not to say that, for example, checksums are a
bad idea. Individual naming schemes may choose to mandate
them (or allow them as an option). A new component, which
consists of a URN plus its checksum, may be defined later.
The issue to me is whether we should be overloading the
essentially simple problem of defining an object naming
mechanism with added features before we've had a chance to
actually use them in the field.

Although a lot of people have told me that URNs are "a
much harder problem than URLs", and despite the donnybrook
which I seem to have kicked off at the last IETF, I
personly believe that it should be fairly easy to come to
closure on the initial URN proposal, if we all agree to
limit the scope of what we're trying to accomplish.

If we don't get too ambitious all we need is a syntax and a
mechanism for assigning naming authorities. Each such
authority would then have responsibility for defining the
specifics of their particular assignment scheme. At that
point, we're done and shoule move onto the next problem,
of which there are many.

There is a definite need for fragment specifiers and the
various other things people have mentioned but personally,
I think we can include most such items into the putative
URC as one of its many potential fields, or provide for
them through composite techniques, combining say a URN and
a fragment sepcifier, a URN and a checksum, etc.
Meanwhile, we can all start using the basic stuff now.

> I've been trying to follow this discussion, but the pressures of other
> work (and the current volume of discussion) are making it difficult. One of
> my research interests is collaborative editing/and revision control of
> strcutured documents. The solutions that come out of this sort of problem
> (at least the ones that I favor) generally need large numbers of unique
> identifiers, since in general each operation performed by a user needs a
> unique ID (one can identify data by the operation that created it)

This is an interesting application. I see no reason why a
successful URN scheme can't include the ability to
reference hierarchies of resources and this would I think
allow you to have what I think you need, but we should
decide whether what you really need are hierarchal sets of
URLs or hierarchal sets of URNs (or most likely, both).

Note also that this is a separate question than whether
URNs should support a hierarchal naming scheme. What I'm
referring to is the ability for a URN to reference a
composite object, with each object in turn having its own
URN.

The point I see emerging out of this is that when
considering the assignment of URNs, the idea is to assign
a URN to the smallest atomic object that can be referenced
(eg. a file). This may be a complete document (when dealing
with files), or it may be a chapter, section or paragraph
(when dealing with a hypertext objects within a system
that allows further decomposition).

Now, if you have performed an operation (such as splitting
a file, adding a link, editing a document, whatever) that
now permits you to perform an additional decomposition, it
would certainly make sense to assign additional URNs to
the corresponding components. And at this point, if your
system still allows referencing the entire object the
original URN is not invalidated, either. Rather it is now
the name for a composite object which can in turn be
further referenced by its component parts. Maybe what you
get when you dereference this is a lit of URLs, one for
each piece.

The question then arises as to whether the URN should
itself reflect its composite status, or whether it is in
effect an apparently randomly generated string and the
meta-information about composition is to be derived
elsewhere. I would vote to move this level of complexity
out of the URN and say that any particular naming
authority may choose to support this, but it is not a
requirement of the general URN mechanism that it be
decomposable, hierarchal, or whatever.

The question of determining relations among URNs can form
the heart of a very interesting system that supports a
variety of query operations (such as Is-Part-Of, Includes,
Is-Followed-By, etc). I just don't think this has much to
do with the URN's basic role of naming the component for
further reference across the network and this structure,
while a wonderful thing, shouldn't be reflected in the URN
itself.

Again, it might be possible to perform this integrating
function in to the as-yet undefined Uniform Resource
Citation (although an active system which permits a number
of query operations would be inherently more powerful and
would be the way I'd like to see things develop). Whatever
approach to punting the problem we agree upon I strongly
believe we should try to keep the URNs as conceptually and
practically clean and simple as possible.

We can do a lot with simple identifiers and addresses. We
need them today and can build interesting tools with them
tomorrow if we agree to limit our scope in this paricular
round of negotiations.

> It also seems pretty obvious that some kind of universal adressing is
> needed for portions of documents, without requiring those documents to be
> modified. This is certainly going to be needed for interactive and
> collaborative hypertexts. One easy (though not super-flexible) way to get
> this is to require version numbers, and have versions specify a particular
> encoding scheme and fixed byte sequence -- ie. each docuemnt would contain
> a specification of its encoding for newlines and the like -- then byte
> offsets can be used.

I have to be careful not to overreact to the word "addressing"
but as I said above I think it important to keep in mind
that URLs are for "addressing" and URNs are for "naming".
We may have a particular system that uses only URLs
(as in fact WWW does now), a hybrid system that supports
both URLs and URNs, or a system that internally uses URNs,
with a functional system for obtaining URLs on demand
(probably the ideal but certainly a ways out).

Now, to address your point, just because we want to allow
specifying portions of documents doesn't mean that we're
required to modify the documents themselves. It may
require nothing more than providing a server that can
identify offset or range information when supplied
and return only the specified portion. I just think that
this is a secondary need in the general scheme of things
and should be punted for now (again, it's fine for Z39.50
or WWW or whatever to support this today, and for the
encoding of these protocol references to include the info.
Let's just not complicate things by trying to find the
generalization of the concept within the next month or so).

> I've still to read all the relevant documents, so I'm not commenting on
> the proposals, but trying to give a perspective on a different view of the
> problems you mention, and (perhaps) unusual applications of universal names
> (ie. naming small portions of documents). I also think that a URN must name
> an immutable object in order for internal anchors to work, and that implies
> that some hooks must be left for version control for those applications
> that will support it.

Thanks for the different perspective. It's obviously
important to strive for the general while arguing about
the specifics and I certainly don't to sound like I'm
arguing against doing this. I do think that we've a quite
clear idea at this point of how a couple of fairly simple
things should be structured and we have fairly well-defined
proposals for them. We should concentrate on documenting
this ASAP. As we do so, and identify additional things we
need, lets see how we can build things up incrementally.

Enough. I have a whole posting on URLs swirling around in
my head, too but if I don't stop now I know nobody's going
to read this.

- peterd

-- 
------------------------------------------------------------------------------
     Peter Deutsch,                                  (514) 875-8611  (phone)
  Bunyip Information Systems Inc.                     (514) 875-8134  (fax)
    <peterd@bunyip.com>

"Charging for information is not a crime, any more than charging for food is a crime. On the other hand, I agree that letting people _starve_ is a crime." ------------------------------------------------------------------------------