To: uri@bunyip.com
In-Reply-To: connolly@hal.com's message of Thu, 30 Jun 1994 21:04:41 -0700 <94Jun30.210445pdt.2762@golden.parc.xerox.com>
Subject: Re: current status of URL document...
From: Larry Masinter <masinter@parc.xerox.com>
Message-Id: <94Jun30.222333pdt.2760@golden.parc.xerox.com>
Date: Thu, 30 Jun 1994 22:23:33 PDT
I am increasingly convinced that reserving /?#; in ALL schemes would
be a GOOD IDEA, with the meaning that / separates hierarchical
components, ? separates the hierarchical components from a `search'
element, # identifies a `fragment', and ; is a separator for
`additional attributes', in the order
scheme:
//user:pass@host/
path1/path2/.../pathN/
name?search#fragment
;attribute=value
;attribute=value
with the rules that the following characters must be encoded within
the following fields:
user :@ unsafe
pass :@ unsafe
path ?#; unsafe
name ?#; unsafe
search ?#; unsafe
fragment ?#; unsafe
attribute ?#;= unsafe
For example, require that ?# be encoded within Gopher and FTP URLs.
Since many characters must be encoded anyway, having simpler and more
consistent rules for which characters must be encoded when will make
URL-interpreting and generating code more reliable. In fact, some
schemes may be robust enough to ignore the `special' interpretation,
but frankly, I wonder why the FTP scheme does *NOT* require # to be
encoded, so that I can make references to fragments within .html
files.
It means that you might have to encode some additional characters in
your `mailto' URL that you wouldn't otherwise, but would have few
additional constraints.
Query on `telnet' URLs: should the syntax be telnet://user@password:host/