Toward an unambiguous grammar [Was: FTP syntax ]

Daniel W. Connolly (connolly@hal.com)
Fri, 25 Mar 1994 11:19:40 -0600

Message-Id: <9403251719.AA06039@ulua.hal.com>
To: timbl@www0.cern.ch
Subject: Toward an unambiguous grammar [Was: FTP syntax ]
In-Reply-To: Your message of "Fri, 25 Mar 1994 13:51:10 +0100."
<9403251251.AA15177@ptpc00.cern.ch>
Date: Fri, 25 Mar 1994 11:19:40 -0600
From: "Daniel W. Connolly" <connolly@hal.com>

In message <9403251251.AA15177@ptpc00.cern.ch>, Tim Berners-Lee writes:
>
>> ftp://host/dir1/dir2/file;type=a
>
> I would even propose it.

Seconded.

>> When I was exploring lots of possibilities in the test suite, I found
>> several cases that motivated leaving * as a data character -- it might
>> have special meaning within a scheme, but not across all schemes. I vote
>> to keep it in the unreserved set.
>
> The rpoblem is that if it has a special non-opaque meaning
> for *any* scheme, we have to reserve it now, or URL
> encoders will lose the difference between * and %2A.

How so? If * has meaning only within one scheme, then there is no need
to differentiate between * and %2F. Ah... unless there is a need to
represent the character * as well as the token * within that scheme.
But that can be solved with another escaping mechanism if necessary.
But, for example, with user@host.com, @ is never legal within a
hostname or username, so there's no need to differentiate between @
and %40 -- the user and hostname can be split apart _after_ escapes
have been undone.

> For your scheme above, we have the same problem with ;
> in that if I specify a file whose name is really "foo.com;1"
> as "foo.com%3B1" then any URI gaterway/encoder is free to
> decode it to foo.com;1 which screws the FTP uRL syntax.

True. It is critical to decide which characters have special meaning
in the URL syntax. But I disagree with the idea that we need to decide
which characters have special meaning in individual schemes.

Dan