Message-Id: <9404260506.AA08453@expresso.bunyip.com>
From: Peter Deutsch <peterd@bunyip.com>
Date: Tue, 26 Apr 1994 01:06:19 -0400
In-Reply-To: Alexander Dupuy's message as of Apr 17, 17:23
To: dupuy@smarts.com (Alexander Dupuy), uri@bunyip.com
Subject: Re: MD5 and LIFNs (was: Misc Comments)
Hi,
[ Alexander Dupuy wrote: ]
> > o LIFN's for "byte-stream" identification is very important. Shouldn't
> > it be possible to now define an "MD5" namespace authority via an
> > informational RFC which specifies how to calculate the defacto name
> > of any byte-stream?
>
> While this seems like an interesting proposal, I see two problems with it.
> The MD5 namespace is non-hierarchical, so a single namespace authority would
> have to administer the MD5 names for every published resource in the world;
> this is unlikely to scale well.
Actually, proposals for using MD5 as a URN have been
circulating for some time now (certainly since the early
threads on URSNs, etc). Their principal attraction (to me,
there may be others) is that they in fact do not need any
namespace authority support to be deployed and the scaling
issues are not nearly as bad as they might appear.
As an example of how they might be used, I forsaw using
MD5 checksums in those cases where it is not feasible to
retrofit URNs to existing information. The canonical
example that we came up with was signing all that info on
anonymous FTP archives. Once we did this, it would be
possible for us to pick up and index this MD5 info
automatically with archie as part of the regular gather.
Users could then perform a basic URN->URL lookup with
nothing more than an archie search on a checksum. Note
that this avoids problems of duplicate filenames, and
allows us to detect renamed files trivially.
Once this was proposed (way back at the San Diego IETF)
someone whose name escapes me (apologies for my failing
memory) actually went back to the terminal room that night
and hacked their "ls" to calculate and serve MD5 checksums.
You could generate them with nothing more than the
appropriate command line switch (something like: "ls -x").
If all ftp servers used this ls, the problem of deployment
would be solved and the entire task could then be
automated (for this example).
We haven't done anything with this yet, and I don't think
the code has circulated widely, but it proved how easily
it could be done. We got sidetracked with the infamous URL
debate, but it's an idea we could resurrect in no time
once we agree that it would be a good idea.
> The second problem is that while it is extremely unlikely for any two given
> files that they will share the same MD5 digest, when you increase the numbers
> of files, the chance that some pair of files will share the same MD5 digest
> increases extremely quickly. This is a variant of the "Birthday paradox"
> which is the name for the apparently paradoxical fact that given some number
> (roughly 30, I think) of people, the chance that two of them will have the
> same birthday is better than 50%. Given the moderate probability that some
> two of a few millions of files will share the same MD5 digest, it seems an
> inappropriate choice for a namespace.
The counterargument is that a poor URN is better than no
URN and thus where it is infeasible to calculate or assign
a better URN, this would be the way to go. Given that it
would allow one of the basic operations we desire of a URN
(albeit an imperfect implementation) it certainly seems to
satisfy my needs in one particular application. We
certainly intend to use them once the syntax is frozen (if
not sooner...)
> Given these problems with an "MD5" namespace, I would like to state that I too
> feel that some sort of LIFN namespace will be very important and useful.
> However, it will have to be hierarchical so that it can scale to include all
> published digital works.
I'd like to repeat a point I've made before. I strongly
believe that there will not be a single URN format, any
more than there is currently a single user level access
protocol, nor should there be. I think that some of the
desireable characteristics of a URN are conflicting and
incompatible and some judgement will be needed to select
the most appropriate format for a given application.
As I mentioned above, I think MD5 checksums are a suitable
mechanism for assigning something that has many of the
properties of a URN for applications (such as anonymous
FTP) where other mechanisms are not feasible. I think this
will be one of a suite of techniques available to creators
of naming authorities as they set up their systems. The
presence of a naming authority part of the URN makes
selecting the correct handler, or the appropriate proxy
willing to handle the URN->URL task, is well within our
means today.
At the same time, I like the technique of locating
URN->URL servers through DNS, but don't think it will be
the only one used. We need to keep the need for
flexibility in the forefront of our minds when designing
these things.
Agreed? Disagreed?
- peterd
--
-----------------------------------------------------------------------------
"What do thay got, a whole lot of sand? We got a hot crustacean band!
Each little clam here, know how to jam here! Under the Sea!"
-----------------------------------------------------------------------------