Message-Id: <9403231546.AA01152@ulua.hal.com>
To: timbl@www0.cern.ch
Subject: Re: FTP syntax
In-Reply-To: Your message of "Tue, 22 Mar 1994 17:51:49 +0100."
<9403221651.AA13731@ptpc00.cern.ch>
Date: Wed, 23 Mar 1994 09:46:00 -0600
From: "Daniel W. Connolly" <connolly@hal.com>
In message <9403221651.AA13731@ptpc00.cern.ch>, Tim Berners-Lee writes:
>
>
>Summarizing, the consesnus is now that the FTP syntax should be
>of the form
>
> ftppath <fpath> [ <separator> <mode> ]
>
> <fpath> <xpalphas>
>
> <separator> (we have been talking about a colon)
>
> <mode> dir | bin | text | tenex
>
>where if the mode is omitted the client has to do its best to figure it out.
A question and a suggestion: Is there a consensus on whether
<fpath> is opaque or can be parsed into pathname components?
i.e. must a client interpret
ftp://host/dir1/dir2/file
as
RETR /dir1/dir2/file
or can it do something like
CWD /dir1
CWD dir2
RETR file
Suggestion for mode information:
ftp://host/dir1/dir2/file;mode=bin
This is consistent with my recently suggested UNAMBIGUOUS grammar,
available in
<http://www.hal.com/~connolly/dist/url_test-19940316.tar.Z>
>...Now the rules for
>escaping characters in URLs are such that reserved characters
>like / have different meanings when escaped, but other characters
>not explicitly reserved MAY be escaped if the medium in question
>doesn't like them.
>...
> But it does define a connonical form for standard
>interchange.
> ...
> In other
>words, if you come across %20 in a URL or you come across " "
>you must treat them the same. But if you come across %2F and "/"
>you must treat them differently.
Seconded! Agreed! The test suite I wrote does just this.
>Now it is no good if some gateway unescapes the separator, so the separator
>must be from, or added to, the set of reserved characters. Currently
>this set is
>
> { | } | / | # | ? | vline | [ | ] | \ | ^ | ~ < | >
>
>but does NOT include the "extra" characters (from Amsterdam)
>
> ! | * | " | ' | ( | ) | : | ; | , | space
>
For the test suite, I took the set of "data" or "unreserved" characters
from the isAllowed[] table in HTParse.c. It says the only chars that
mean the same when excaped and unescaped are:
*-.0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
I'm willing to add !, ", ', (, ), and , at this point (though I'd
like to reserve (, ), and ,) but allowing : unescaped is fraught with
peril, if you ask me. Consider:
<A HREF="x-ab/c:def">
Is that a relative URI pointing at the file "x-ab/c:def" or an
absolute URI in scheme x-ab/c? The current spec's grammar fails
to disambiguate.
> What I suggest is that just as now, for other schemes we are
> going to need some separator characters, and we grap them now
> while we have the chance. For example, suppose we take
>
>
> ! and *
>
> and say that they are reserved, and may not be used unescaped
> except when having special meanings (to be defined). This would allow
>
>
> ftp://info.cern.ch/pub/www/doc/draft-www-bernerslee-uri-00.text!text
Again, I'd rather see:
ftp://info.cern.ch/pub/www/doc/draft-www-bernerslee-uri-00.text;mode=text
>
> The change to the safe character set would not affect very many URLs
> at this stage.
When I was exploring lots of possibilities in the test suite, I found
several cases that motivated leaving * as a data character -- it might
have special meaning within a scheme, but not across all schemes. I vote
to keep it in the unreserved set.
> I wouldn't be in favor of doing this to ":" as I am sure it turns up
> in current URLs a lot more than "!".
Hmmm... that's bad news. What do folks use ":" unsescaped for these days?
Just filename characters?
Dan