Re: Another snapshot of the URL document.

Guido.van.Rossum@cwi.nl
Mon, 04 Jul 1994 11:00:52 +0200

Message-Id: <9407040900.AA02004=guido@voorn.cwi.nl>
To: Larry Masinter <masinter@parc.xerox.com>
Subject: Re: Another snapshot of the URL document.
In-Reply-To: Your message of "Sat, 02 Jul 1994 00:19:40 MDT."
<94Jul2.001942pdt.2760@golden.parc.xerox.com>
From: Guido.van.Rossum@cwi.nl
Date: Mon, 04 Jul 1994 11:00:52 +0200

Good work!

Here are some micro-nits:

> alphanumerics and most printable ASCII characters, with the
> exception of "#" and "%".

Maybe explain in short why '#' is universally reserved?

> To avoid confusion, is strongly recommended that all `unsafe'
> characters be encoded; that is, all characters except the
> alphanumerics, "$", "-", "_", "@", ".", "&", "+".

I can think of some contexts where even some of these are not safe: in
a passwd, '@' should be encoded, and Mosaic forms use '&' as the field
separator, requiring encoding of '&' characters occurring in field
values but not of those used as separators. (This may occur in HTTP
search strings.)

> FTP URLs follow the syntax described in section 3.1. The port
> number, if present, gives the port of the FTP server if not the FTP
> default (23).

Surely the default FTP port is 21, not 23 (which is telnet)!

> 3.2.1. FTP Name and Password

The ftp protocol specifies an optional third key, "account". I've
never seen this used. Is it obsolete, or would it make sense to allow
for an optional :acct part after the password?

> HTTP URLs take the following form:
>
> URL:http://<host>:<port>/<path>?<searchpart>
>
> where <host> and <port> are as described in 3.1. If :<port> is
> omitted, the port defaults to 80. <path> is a HTTP selector, and
> <searchpart> is a query string. <searchpart> and its preceding "?"
> is optional. No user name or password is allowed.

In practice, <path> is also optional, and if neither <path> nor
<searchpart> is present, the / may also be omitted (this gets the
top-level directory of a server just like for Gopher).

> <gopher-path> may
> be empty, in which case the delimiting "/" is also optional.

Maybe make it explicit that in this case <gophertype> defaults to 1
(i.e. a directory).

Finally, if we allow NNTP references, which are explicitly local, why
not standardize the 'local-file:' URL format as well, with a similar
caveat? Proposed prose:

3.10 LOCAL FILES

The local file scheme is used to designate files in the client's
local file name space.

Local file URLs take the form

URL:local-file:<path>

They do not take a host or port. The <path> is interpreted in the
context of the user's current working directory; absolute pathnames
may be designated in a system-dependent way (e.g. on UNIX, absolute
pathnames begin with "/"). The intent is that a simple "open file"
system call (e.g. fopen("<path>", "r") in C) will open the
designated file.

Local file URLs are unique in that their meaning varies depending
on the context of the client. They are a common extension for
clients, so that standardization is in order.

> search = *[ uchar | ";" | ":" | "@" ]

Should use round brackets here (and in several other places using *[...]).

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
URL: <http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>