Re: how to make progress on the URL document

Alexander Dupuy (dupuy@smarts.com)
Thu, 24 Mar 1994 12:50:06 +0500

Date: Thu, 24 Mar 1994 12:50:06 +0500
From: dupuy@smarts.com (Alexander Dupuy)
Message-Id: <9403241750.AA22255@brainy.smarts.com>
To: timbl@www0.cern.ch, uri@bunyip.com, mpm@boombox.micro.umn.edu
Subject: Re: how to make progress on the URL document

> Other advocates of requiring "?" as a separator for all URLs were not
> forthcoming, and the consensus is that URLs are access methods
> with parameter packages that are specific to the access method.

I guess I better speak up now, lest silence be construed as assent. I think
that there is some use to knowing that a particular URL represents a search
rather than a document; it gives you some idea that this is more likely to
return different information each time used than a document URL. In
particular, I can imagine some caching schemes that might find this
information useful

Requiring ? as the separator for query URLs provides this information without
requiring detailed understanding of the exact query syntax. Even if the
client application doesn't understand the qurey syntax, the human using the
application may, and it could be useful for the application to allow the user
to modify the query.

Since it isn't difficult to convert non-escaped ? to TAB in gopher URLs, there
is no loss of functionality, only a very small amount of additional complexity
in gopher URL code, and there is a significant increase in the information
available within a URL which doesn't require understanding the specific nature
of the specific URL protocol type.

> The point has been made again and again that you can't argue that because
> there is an installed base, we can't fix things. Moreover, existing URLs and
> software can be distinguished from the new URLs since the new ones have the
> URL wrapper around them.

I agree that the presence of an installed base does not a priori prevent us
from making any changes. However, the impact on the installed base should
still be weighed against the benefits of any change. In my view, the benefit
of changing the query separator for the gopher URL is extremely minor, and the
impact is fairly significant, especially given the prevelance of gopher URLs.
Incidentally, I don't buy the argument that new- and old-style URLs will be
easily distinguished by the presence or absence of a URL: pre-prefix, since I
suspect that they will often be stripped and reattached in conversions to and
from internal representations.

> > 2. In a URL, if a "/" does not imply a hierarchy, it
> > should be escaped. (This just affects gopher servers
> > whose files contain "/" but aren't directory
> > delimiters, like Mac servers). If you want the
> > string to be 100% opaque, just specify that
> > "/" is escaped.
> >
>
> Once again, a URL consists of an access method and an opaque parameter
> package for use by that access method. To know how to interpret the
> opaque parameter package, the client writer needs to read the
> section of the URL document that tells them how. If the access method
> says that there is no heirarchy to be assumed, and you ignore this, then
> you do so at your own risk.

I have to agree with Tim that non-hierarchical / should be escaped. I know
that partial/relative URLs are no longer in the requirements document;
nonetheless, there are cases where relative references (not URLs) within
collections of documents are very convenient; if the hierarchy present in a
URL can be unambiguously extracted, it makes it much easier to automatically
convert these relative internal references into URL form. I suspect HTML is a
good example of this, and is the motivation for Tim's position.

I also think that there may be cases where humans may want to take a URL as a
starting point for a more conventional navigational search. In this case, an
explicit representation of the hierarchy, if any, may be useful, although of
course the specific protocol will need to be understood to support
navigation.

> We shouldn't have to transpose everything into EBCDIC and then triple DES
> encrypt it to protect against people that can't read an RFC before writing
> their client. If the spec. says that something is opaque, it should be
> treated that way.

I think this is exaggerating the case quite a bit. Someone recently posted a
fairly simple set of rules for URL character set issues which specified which
characters have special syntactic meaning in all URLs, and must be escaped,
which characters are restricted, and must be escaped (control characters,
etc.) and which characters are "ordinary" and should not be escaped in
canonical form. These rules seemed pretty clear and easy to understand.

@alex