Date: Mon, 4 Jul 94 14:13:13 +0100
From: Tim Berners-Lee <tbl@www0.cern.ch>
Message-Id: <9407041313.AA01795@ptpc00.cern.ch>
To: Guido.van.Rossum@cwi.nl
Subject: Re: Another snapshot of the URL document.
|From: Guido.van.Rossum@cwi.nl
|> To avoid confusion, is strongly recommended that all `unsafe'
|> characters be encoded; that is, all characters except the
|> alphanumerics, "$", "-", "_", "@", ".", "&", "+".
|
|I can think of some contexts where even some of these are not safe: in
|a passwd, '@' should be encoded, and Mosaic forms use '&' as the field
|separator, requiring encoding of '&' characters occurring in field
|values but not of those used as separators. (This may occur in HTTP
|search strings.)
Do not confuse unsafe and reserved characters!
Reserved characters have *different* meanings when encoded
Unsafe characters have *the same* meaning when encoded.
As any gateway, proxy, etc, is allowed to encode or decode
any unsafe characters within a context whose safety is
understood (eg HTTP), you cannot say that %27 and & in a
form-generated URL mean different things.
|> 3.2.1. FTP Name and Password
|
|The ftp protocol specifies an optional third key, "account". I've
|never seen this used. Is it obsolete, or would it make sense to allow
|for an optional :acct part after the password?
|
|> HTTP URLs take the following form:
|>
|> URL:http://<host>:<port>/<path>?<searchpart>
|>
|> where <host> and <port> are as described in 3.1. If :<port> is
|> omitted, the port defaults to 80. <path> is a HTTP selector, and
|> <searchpart> is a query string. <searchpart> and its preceding "?"
|> is optional. No user name or password is allowed.
|
|In practice, <path> is also optional, and if neither <path> nor
|<searchpart> is present, the / may also be omitted (this gets the
|top-level directory of a server just like for Gopher).
You can't talk about a top level directory of an HTTP server,
as it does not have to have any directory structure at all.
There is a convention that
http://<host>:<port>/
is the WWW URI of a welcome page and suitable default entry point to
the server. If the last / is omitted, the parsing software
puts it back on, as the /<path> sent to an HTTP server cannot be void.
|Finally, if we allow NNTP references, which are explicitly local, why
|not standardize the 'local-file:' URL format as well, with a similar
|caveat?\
The www norm for this is 'file:'. Why use local-file?
There was a time when file: in the libwww code would, in the case
of local access failure, resort to using FTP. This is historical.
It causes at least one browser to implement "local-file:" but
nowadays "file:" is used for this and I would like
to outlaw local-file now before there is any other confusion.
|3.10 LOCAL FILES
|
| The local file scheme is used to designate files in the client's
| local file name space.
|
| Local file URLs take the form
|
| URL:local-file:<path>
|
| They do not take a host or port.
Well, in WWW the do take a host (no port) which is the name of the system
on which the filename is valid. This is done so that when a link is made
from a widely readable document to a local document, a user on another
system attempting to follow the link can get an intelligent error message
(Sorry, this document is only available locally on myhost.foo.com) rather
than total confusion result.
The dummy hostname "localhost" was reserved for those rare and questionable
cases in which one wants to make a link to "The /etc/passwd file on whatever
(unix) system you are on".
| The <path> is interpreted in the
| context of the user's current working directory; absolute pathnames
| may be designated in a system-dependent way (e.g. on UNIX, absolute
| pathnames begin with "/"). The intent is that a simple "open file"
| system call (e.g. fopen("<path>", "r") in C) will open the
| designated file.
This is not actually how WWW works. In WWW, filenames are converted into
file: URIs with the / being used for hierarchical boundaries.
So for example the VMS file
DISK$USER:[MY.NOTES]NOTE123456.TXT
becomes
/disk$user/my/notes/note123456.txt
in a way which is quite reversible and included in the libwww code.
This means that relative URLs can be handled within a VMS
filesystem, and for example that a hypertext database can be
used directly on a VMS filestore, and served by an HTTP
server directly. (The cern_httpd daemon does this for example
in its port for VMS). Similarly, you can create files on
a Mac (say) and read them on another machine without having to
convert them.
|--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
|URL: <http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>
Tim Berners-Lee