Why URN is a subset of URL

Roy T. Fielding (fielding@avron.ICS.UCI.EDU)
Wed, 05 Oct 1994 21:17:26 -0700

To: uri@bunyip.com
Subject: Why URN is a subset of URL
In-Reply-To: Your message of "Wed, 05 Oct 1994 20:41:05 EDT."
<9410060041.AA09204@expresso.bunyip.com>
Date: Wed, 05 Oct 1994 21:17:26 -0700
From: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Message-Id: <9410052117.aa04944@paris.ics.uci.edu>

Because of the non-uniform syntax for URLs (as defined by
draft-ietf-uri-url-07.txt, section 2), there is no difference
whatsoever between what we choose to refer to as "URL" and what
we choose to refer to as "URN". Although we may wish to "imply"
some philosophical difference, there is no real difference in the
specification and URNs (of whatever sort) can easily be defined as URLs.

Peter Deutsch wrote:

> [ Daniel W. Connolly wrote: ]
> . . .
>> Bingo. I don't know why folks go to such trouble to distinguish URLs
>> and URNs.
>
> Perhaps you don't understand what we want them for... :-)
>
>> . . . The URL concept grew out of the WWW addressing architecture
>> which was designed to include locationally transparent addresses[1].
>> The fact that such addresses have not yet been deployed is not a
>> design decision but a reflection of the fact that it takes time to
>> deploy technology.
>
> And the URN concept grew out of the need of services such
> as ours (archie and its follow-ons) to identify multiple
> instantiations of information independent of its location.

Right.

> When I get lots of archie hits I cannot simply compare
> their URLs for equality to see if they're the same
> document (because a URL identifies a resource's location
> but not its content), nor can I be sure that I found all
> copies of a document (because someone is free to rename a
> document and its URL would change).

You are referring to a quality of the http, ftp, wais, file,
(and several other) schemes, not to the abstract concept of a URL.
If a particular locator scheme defined a one-to-one correspondence
between document content and URL, the above statement becomes false.

> What _I_ want from URNs is an identifier which will be the
> same for any copy of a resource regardless of its location
> that I can then use to distinguish among multiple
> resources. You may not need this functionality, in which
> case you don't need URNs, but for some of us there is a
> fundamental difference here and we need both URLs and URNs.

Yes, it's needed. No, there is no fundamental difference.
There is a philosophical difference, but that carries no weight.

> Put another way, The requirements for URNs are intended to
> allow us to perform an entirely different set of
> operations on the named objects. In computer science
> terms, you compare URNs and dereference URLs. Thus, I
> submit that the difference between the two classes is real.

In computer science terms, you twice-dereference URNs and
once-dereference URLs, or zero-dereference both if they are
already cached. Once again, this is a property of the scheme
and not any true difference.

>> The idea that URNs are somehow fundamentally different from URLs is
>> odd, and the proposals of deploying a namespace disjoint with the WWW
>> address syntax is just plain silly. The WWW address space acomodates
>> multiple addressing schemes. When we come up with a service that
>> provides the features that folks are looking for in URNs (high
>> availability and authentication), we can start writing urn://... if we
>> like, but I bet we will write whois://... and solo://... and
>> lifn://... or md5://... for a while until one becomes the clear
>> winner and the others die out.
>
> I respectfully disagree with the above paragraph. The WWW
> address space is just that, an address space, along with
> accompanying protocol (and where appropriate, host)
> information. A URL gives you the information you need to
> access a copy of a resource.

This is true.

> It does _not_ allow me to
> perform the operation I need to perform, which is to compare
> multiple instantiations of resources for equality of
> content without examining the content itself.

Not true -- that is a property of the scheme, not of the URL syntax.

> On the other
> hand, a suitable URN _will_ allow me to perform that
> operation. Ergo, URLs and URNs are not the same thing.

Not true. Ergo, URLs and URNs are the same thing.

> (BTW, I certainly don't require URNs to have high
> availability nor authentication. I merely require that
> they identify content, not location.)

No problem,

urn:<unique-content-identifier>

is a URL with (in theory) one-to-one correspondence with the content.

> I do agree that if you have something like
> "whois://server/query-string" you in fact have a URL, not
> a URN, but an MD5 checksum is not a location pointer and
> cannot be used for dereferencing and access without
> further work. You still need to be told which host to
> connect to, which port to use, which protocol to use and
> so on. Given an MD5 checksum, you will still need to find
> the appropriate URL before you can go get a copy. This is
> the _other_ fundamental operation we require for a URN
> scheme, after comparison.
>
> Perhaps the difference between URLs and URNs is being lost
> on some people because the current proposals are focusing
> not on the comparison requirement, but on the companion
> requirement that URNs be easy to dereference. Remember,
> resolution is _not_ the only requirement and for many
> applications things like MD5 checksums will work fine
> (assuming we can build mapping services, which we've
> proved we can do with archie).

Ummmm, I'll differ with Dan here and say that MD5 checksums can't
be URNs either -- they don't uniquely identify content (though the
probability of a collision is quite small).

> With that as background, let's consider a couple of
> scenarios.
>
> In the archie context, we plan to serve to our users both
> a location pointer and a content identifier at the same
> time. Thus, a search for the string fred might return:
>
> URN:12345 URL:ftp://site.com/pub/fred/
> URN:45666 URL:gopher://site.com/usr/fred/
> URN:12345 URL:ftp://bozo.com/pub/fred/
> URN:59555 URL:ftp://mysite.edu/pub/fred/

Why? Why not return

(urn:12345, ftp://site.com/pub/fred/, ftp://bozo.com/pub/fred/),
(urn:45666, gopher://site.com/usr/fred/),
(urn:59555, ftp://mysite.edu/pub/fred/)

> This allows me to see that the first and third entries are
> the same item, so I don't need to examine both (and of
> course this example illustrates how archie will allow us
> to use multiple access protocols, not just ftp. For what
> its worth, archie now supports multiple collections and
> internal test versions support directing a single query to
> multiple collections. Coming soon to an information
> service near you... ;-)

Well, thanks very much for that -- it's an unbelievable pain right
now to do a search and not be able restrict it by country-code.
For many protocols, location information is a good thing.

> Alternatively, I might want to do a search for a
> particular URN, say number 12345, and get something like:
>
> URN:12345 URL:ftp://mysite.com/pub/fred/
> URN:12345 URL:ftp://yoursite.com/usr/zork/
> URN:12345 URL:http://bozo.com/pub/fred/
> URN:12345 URL:gopher://another.site.edu/pub/peterd/ramblings/

Or, even better:

(urn:12345, ftp://mysite.com/pub/fred/,
ftp://yoursite.com/usr/zork/,
http://bozo.com/pub/fred/,
gopher://another.site.edu/pub/peterd/ramblings/)

> This tells me that there are four copies of this
> particular document on the net, under several different
> combinations of naming or protocol. I can now choose to
> fetch the most appropriate copy, if I want it, using my
> favorite client.

Sounds great, but you don't need a separate URN syntax for this.

> Of course, we want URNs to have a few other
> characteristics, as well. We want multiple naming schemes
> to allow us to grandfather existing info collections (eg.
> ISBN numbers). We want them to be easy to transcribe, etc.

Those requirements were also shoehorned into the URL spec.
Thus, we have URN ==> URL.

> Still, the point is that at its heart, a URN is not
> intended to allow access but location independent naming.
> Just as ISBNs and library call numbers are different
> things and serve different purposes, I submit that URNs
> and URLs are different things and will serve different
> purposes.

That is an implementation detail.

> One final thought. I can imagine future systems which deal
> _only_ with URNs, hiding access details from the user. In
> fact, we're working on such a system ourselves. I don't
> think this means that the distinction between URNs and
> URLs will disappear, but simply that we will eventually be
> able to hide the URLs from the user (in most cases).

I think you will find this to be a mistake. Try usability tests
between a system that hides location information and one that shows
it (even if the location is only obtained AFTER a dereference).
I believe you will find that location information (even in the compact
form of a URL) is necessary to prevent user disorientation.

> Conceptually and practically there are still two different
> classes of identifier being used and of course getting to
> this ideal state will still require working with the
> installed base of URLs. There is a difference here and
> even if you don't need both, some of us most definitely
> do...

Given that I can map any URN onto the URL syntax, this is obviously
not the case. I suggest that we stop trying to fool ourselves into
thinking that there is a difference, and just start looking for proposals
that implement URL schemes with one-to-one mapping between content
and URL.

......Roy Fielding ICS Grad Student, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>