Re: URLs should contain types [AND DATES]

Peter Deutsch (peterd@bunyip.com)
Mon, 24 May 1993 02:46:24 -0400

Message-Id: <9305240646.AA02863@expresso.bunyip.com>
From: Peter Deutsch <peterd@bunyip.com>
Date: Mon, 24 May 1993 02:46:24 -0400
In-Reply-To: Terry Winograd's message as of May 22, 10:04
To: Terry Winograd <winograd@interval.com>,
Larry Masinter <masinter@parc.xerox.com>
Subject: Re: URLs should contain types [AND DATES]

Hi Terry, et al,

Well, I warned everyone in a previous posting that I had a word
or three to say about URLs at this point. Your posting
seems to have dislodged something so here's some off the
top of my head comments.

WARNING! It's now after 2am and I may repudiate this
tomorrow! :-)

[ Terry wrote: ]

> As long as Larry is muddying the waters by arguing for adding types to URLs
> (with which I actually differ), I will also toss in a suggestion I have
> been holding onto for a while that they should include Date/Times. . . .
. . .
> Imagine finding among the debris on your desk a scrap of paper that says
> "Joe Doe, Marriott Hotel, 445-9834 room 334". Seems like a perfectly good
> locator, but should you call that number to find Joe?
>
> PROBLEM 1 (relative context): . . .
> PROBLEM 2 (time of knowledge): . . .
> PROBLEM 3 (extent of validity): . . .
. . .
>
> I have chosen a hotel number consciously, since the kind of transience we
> associate with hotel residence is the appropriate metaphor for on-line
> objects, when we look at the decades-centuries time scale. In general the
> world will be full of "stale locators" and it is very useful to be able to
> detect them. Notice that the problem is not just one of failure to find
> something where you expect it, but also of getting the wrong item (someone
> else has checked into the hotel room!).

I really like your analogy, and would point out that this
example offers a great illustration of why you should
instead have written on that piece of paper something like:

"URN:NIC-WHOIS:PD45:::".

(or to keep with the analogy):

"Joe's Secretary, (514) 875-8611"

Which you might recognize as a URN that references a
server, keyed on a user handle, that can get Joe's record
upon demand. It should be fair to demand that this server
be able to tell me how to talk to Joe, no matter where he
is right now.

Now, whenever you want to call this person in the future,
you or your client software merely has to utter something
like "Oh magic server, that I know so well, please make
this URN into a URL" (or whatever else the protocol
interaction is going to be to dereference a URN) and then
use the resulting URL to access him at his then-current
location. Despite historical evidence to the contrary,
deploying such a system is simply not brain surgery. It
just hasn't been done yet. Still, I'd argue that it's not
that far away at this point.

Ask yourself this - if it's been long enough for you to
want to even start to question the validity of the URL, why
would you _not_ simply go back and regenerate a new one by
default, assuming that a mechanism exists that allows you
to do this at a reasonable cost? That's what you do for
hostname lookups, isn't it?

It seems to me that we should be concentrating our efforts
on making it so easy for you to do this on demand that it
would never occur to you to consider using URLs in the
place of the more appropriate URNs, any more than you'd
expect to know the route that AT&T used to set up your
last long distance phone call to your mother, so you could
tell your phone what to do to build that call again.

And provided that Joe and his server are on speaking terms,
you should always get valid contact info when you do it
this way, even if he's back from his business trip and has
left his wife and kids to go live on the beach in Malibu
and is only addressible via his new cellular TCP/IP link
to his laptop.

This dynamically derived information model is the one
we're using at the lower layers of the Internet
architectural model, in such applications as DNS, routing
and so on and it seems to have stood us in good stead in
those areas. Why not insist that we can make it work here,
as well?

It seems to me that there is a very real danger that if we
persist in trying to make URLs do more and more that we
will end up both breaking URLs and undercutting the effort
needed to deploy URNs to no good effect. I repeat what I
said in my last post - let's first get something going,
then make it bigger and better.

Having said all of this, it's true that stale, expired
URLs are a potential problem. The question is whether we
can expect a basic URL to include mechanisms such as
timestamps to flag the _potential_ of such problems since,
in fact a URL may still be perfectly good even after the
timestamp's expired, or may already be invalid after the
first use. After all, Joe may happen to have suddenly
received a piece of email from his wife and be on his way
home to work on the reconciliation. If Joe's cellular is
smart enough it can detect this movement and maybe send a
message to his secretary to announce the good news.
Certainly as Joe's friends we should not have to track his
every movement just because me might want to talk with
him at some point.

I think we should be relying more on an information systems
architecture to help us out a bit more in the problem
cases by demanding that it provide some functionality now
that we're going to be needing eventually anyways.

In your example, I would add that your timestamps don't
seem to actually solve the problem they purport to address,
since even with them you can't actually tell if a URL is
good without trying it, and even then you can get spoofed.
As my dad is fond of saying "that and a dollar buys you a
cup of coffee".

Personally, I'd find it of more use to be able to call
back either the source or the target of a URL, to ask it
additional questions, such as "When was this createed? Any
changes on this since yesterday? Can you tell me more
about this target?" and so on. I think determining the
functionality of such a system, building it and deploying
it would have greater long term value than trying to
encode lots and lots of extraneous details into the basic
identifiers themselves, detail that may or may not stay
current for any length of time and thus really cries out
for automation, anyways.

> So the basic proposal is to have two OPTIONAL date/time fields in a URL,
> one the "AS-OF" time, which gives a time at which the locator was asserted
> to be valid, and the other an "UNTIL" time, which declares a commitment by
> the management of the location that the item will not be changed or moved
> before that time. This does not mean that it will necessarily go away
> after that. A locator with a past-due UNTIL time may still be a very good
> hint for finding something.

If we do that, then we might also consider giving Larry
his "TYPE" field, as well. Can you explain why you take
exception to his suggestion? I can see that it would be
useful in certain circumstances, too. Also, those
checksums would be useful in bars and other noisy
environments and some people really want them. We could
add fragment specifiers, and someone once suggested
copyrights, as well. I think I see where this is going,
but I'm not sure that's really where we want to end up.

My question is whether any form of a compound reference
would still satisfy our initial set of goals for either
URLs or URNs and still remain useful over time. As I
remember it, we wanted these things to have such
characteristics as being simple, short, easily readible,
easily transcribable, universally accessible, and so on.
People now seem to be suggesting a variety of options
which support machine-machine interactions and increase
functionality but any amount of such additional complexity
seems to take us towards a system without at least some of
the initial list of characteristics we said we wanted.

Now, I know that all sorts of reference information is
ultimately going to be sent from machine to machine
without my knowledge or intervention, and adding in a
couple more fields and a few more bytes to each one is no
big deal in such cases. On the other hand, the way things
are shaping up, those bar napkins would have to be made a
whole lot bigger, or we're all going to be writing things
a whole lot smaller, to handle such beasties. More likely,
we're just never going to use all these optional fields
when humans are transcribing them, which I keep hearing we
still have to allow for.

So, here's a question - can we drop the "simple, human
readible and transcribable" class of requirements or do we
have to continue to include them in our list of desirable
characteristics? IF we do have to include them, doesn't
that suggest we forego all those optional fields?

Or maybe we should accept that simple/human-readible and
powerful/fully-functional are two conflicting sets of
requirements for URLs, which has led us to try and create
two completely different set of thingies which are
currently carrying the same name?

Maybe what we really want is something that we can write
on a bar napkin that can _get us_ a URL, but maybe we
should say right here and now that it will never actually
be a URL that we write down (I'm assuming we're talking
about mere mortals and not one of those geeks who claims
that telneting to the SMTP port to do a VRFY is actually a
valid way of checking on an email address). Maybe we don't
have to allow for any number of things from a URL that we
thought were essential.

Put another way, if we can accept that URLs are inherently
for machines, and machines are just so much more clever
than we are, then all of a sudden we don't need to worry
about how many blanks there are, and whether that's a TAB
or a SPACE and how we're going to encode curly braces,
since humans never see them and we just let the machines
send their bytes and be done with it.

If somebody wants to write a URL reference down, why not
make them instead write the URN and the address of the
server and be done with it? Certainly, we can continue to
write the hostname and file details for anonymous FTP or
whatever, as we've been doing all these years, but I don't
see the problem here since that's what we're doing now.

So, a suggestion - maybe we should declaure that
inter-service interactions are to be done at the level of
the URN (where I'm starting to think it _should_ be), and
not the URL, which is inherently transitory, system
dependent and complex?

I know this might sound like I've lost my mind, but this
approach would mean we have some incentive to get the URN
stuff working now. Besides, from what I've seen passing on
this list recently, there seems to be a lot more
controversy concerning what we should be adding to URLs
than URNs at this point. I think this is in part because
we're asking too much of what should be simple pointers.

Okay, assuming that this is all demented ravings due to
caffeine withdrawal at 1am on a Sunday morning, and we're
going to insist on trying to square the circle and make
URLs both simple and complex, readible and transcribable
yet able to encode anything, compact yet fully functional,
I guess what I'd like to see more than anything else is
that we start, for each proposed new feature or option,
that we perform a simple cost-benefit analysis, along the
lines of "Is this new option going to be needed often
enough to be worth the added cost, or is it simply adding
too much complexity for what it buys us?" I think it fair
to ask each person who proposes something to do this for
the rest of us, so we can assess the cost of the proposed
option.

I also really think we need to go back and finish that
checklist of design goals before we can answer any
questions about options in a meaningful way. Once we have
that checklist then it should at all times be driving our
design decisions and analysis. Yet we still haven't come
to closure, or even had a significant debate about it.
This makes me nervous.

- peterd

-- 
------------------------------------------------------------------------------
     Peter Deutsch,                                  (514) 875-8611  (phone)
  Bunyip Information Systems Inc.                     (514) 875-8134  (fax)
    <peterd@bunyip.com>

"Charging for information is not a crime, any more than charging for food is a crime. On the other hand, I agree that letting people _starve_ is a crime." ------------------------------------------------------------------------------