Re: [connolly@hal.com: Re: Identifying scripts by file extension?]

Keith Moore (moore@cs.utk.edu)
Tue, 15 Feb 1994 17:51:09 -0500

Message-Id: <199402152251.RAA06353@wilma.cs.utk.edu>
From: Keith Moore <moore@cs.utk.edu>
To: Larry Masinter <masinter@parc.xerox.com>
Subject: Re: [connolly@hal.com: Re: Identifying scripts by file extension?]
In-Reply-To: Your message of "Tue, 15 Feb 1994 10:45:40 PST."
<94Feb15.104551pst.2732@golden.parc.xerox.com>
Date: Tue, 15 Feb 1994 17:51:09 -0500

Larry Masinter writes:

> The question arises: which part of a URL is opaque, and which parts
> does the client get to peek into? If the clients are going to peek
> into URL's (for example to resolve <A HREF="../foo/bar.html"> into a
> global URL), then the servers can't use arbitrary strings as paths in
> URL's. We've already seen news id's and WAIS doc-id's clash with
> relative URL's.
>
> I will again assert that what we should use the SGML parser to do
> whatever parsing is going to be done on the client side, and make the
> results opaque to the client, thereby allowing the server to use _any_
> string it wants to encode info.

No. URLs might be used by applications that have nothing to do
with SGML. URLs should not be dependent on SGML.

> We should also _allow_ a link to
> contain content-type information. (How else do I link to a postscript
> file on an ftp archive? By file extension? Come on!)

No. The content-type of an object has nothing to do with how you
access that object.

> The fact that relative HREFs are so widely used justifies support for
> the feature. But I think you should be able to stick a tree of HTML
> documents, gif files, postscript documents, etc. on an FTP server and
> have it work just as well as putting them on an HTTP server. Also, you
> should be able to copy those HTML documents to a local disk and use
> them there.

Relative URLs are used because we don't have a better mechanism in place. That
doesn't mean that they are worth the price we have to pay for them. The meaning of
a document's links should not be dependent on the location where it is stored.
It's too easy to accidentally change the meaning just by renaming a document.

> The means we need an interoperable way to combine a relative link with
> a global link to form a new global link. At first, it seems this
> should be done on the server side so that the link strings can stay
> opaque and the client can stay dumb.
>
> But that doesn't work:
> * if you want to use FTP, or
> * if you want to be able to move the documents around
> without changing them, or
> * if you want to serve the same files up via HTTP, gopher,
> and FTP at the same time without filtering them.

If you accept that we need relative URLs, this is true.

> Then perhaps we should just once and for all agree that a URL includes
> a path that is a list of names, where the syntax of names is the
> intersection of the POSIX portable filename syntax and the SGML token
> syntax. (Yuk, but...) ULR's can alternatively contain a "selector
> string" that is opaque and does not combine with relative locations.

Yuk, indeed. URLs are NOT necessarily file names, and we shouldn't try to treat
them like file names.

> Once you look at how messy the punctuation strategies get in general,
> SGML syntax is as good as anything. And since we're already
> implementing an SGML parser, why implement a _separate_ URL parser?

Because we're not necessarily implementing an SGML parser. URLs are
not specific to WWW.

> But it's VERY important that we standardize on which parts of a URL
> are opaque, and which are not. The current strategy is breaking down.

Agreed here.

But there are other ways to solve the problem. Properly defined URNs
would allow us to do the same thing.

If we have to have relative URLs, they need to be defined in such a
way that they don't assume any particular file name syntax, and they
don't assume that all URLs contain file names.

I personally believe that a URL should be self-contained; you shouldn't need any
content-specific information to evaluate one.

Keith