Date: Wed, 13 Oct 93 12:04:57 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-Id: <9310131104.AA02085@www3.cern.ch>
To: Mitra <mitra@path.net>
Subject: Re: (LONG) Detailed notes on URL6.TXT
>Date: Sun, 10 Oct 1993 19:39:20 -0700 (PDT)
>From: Mitra <mitra@path.net>
>
>These are notes on the 14 July 1993 draft of the URL spec,
>otherwise known as:
>
>ftp://ds.internic.net/internet-drafts/draft-ietf-uri-01.txt
[..]
>Tim - do you want to take these items and combine them with any
input
>to create a changes proposal for Houston, or do you want me to do
this?
I have taken some poinst and incorporated directly changes into
the spec on line -- read the hypertext version
<http://info.cern.ch/hypertext/WWW/Addressing/URL/Overview.html>
or when I have finished I will generate the txt and ps for a
new Internet Draft.
> Because a concern of some people is compatibility with
> older applications (especially WWW)
Hey, are we part of the aging establishment now?! :-)
> I've added comments where I can see some implication for
>WWW - since that's not the app I'm most familiar with I've probably
made
>some mistakes there.
>
>- Mitra
>
>==========================================================
>
>1) Uniqueness Page 5:
>
>Garbled text should read "It is suggested that each object have a
unique
>"official" name"
Yup.
>2) Choices - pg 7:
>
>Delete para "The use of white space ... between applications"
>white space was reinstated in Amsterdam,
I have modified the text to a warning, and put in
a paragraph explining the discussion for posterity.
>WWW (or other aps) can still choose to escape white space, but
should be
>recognized where input by a user. Whether white space needs escaping
>inside a URL inside a HTML document is a choice for the HTML
standards
>people.
Agreed.
>3) Fragment Id - pg 7
>
>Delete entire section
>
>I thought Fragment id's had been removed from URL's, and that # was
>restored as a valid character inside a URL. Other apps than WWW use
#
>for other purposes e.g. its a valid character in a filename.
>
>WWW - I think this causes WWW problems.
Had they been removed? I thought that they were there as
a hook for future work.
WWW certainly uses them and to change that would
cause problems. I'd like to keep them in.
Many systems need or use some form of fragment ID.
>4) Path - pg 9
>
>Replace "must" with "should"
>
>This is a function of the application, in most apps "/" has
hierarchical
>meaning, in others its part of the valid character set with no
syntatical
>meaning.
No, I think that the "/" was mandated. If someone
uses "/" for something else, then it should be %2F'd.
Tough, but useful. Hierarchy is SO common that
it is very useful to be able to see a relationship.
For example, I might want to cache xxtp://friedwhoop/gh/*.
>5) Partial form pg 10.
>
>Replace "/xxx/.." with "xxx/../" and "/." with "./"
Agreed.
>6) Encoding prohibited characters pg 11
>
>Delete last paragraph "The same considerations...URL"
>
>See notes above on fragments.
See objection above.
>7) FTP pg 12
>
>Last para - append to sentence ending "deduced from the data format"
the
>phrase "or from information carried with the URL in a URC"
Agreed, though I removed the reference to URC which is
not yet in the spec. In the "Terms" section I have defined
URLs and URIs and mentioned (forward ref) to URNs.
But I will leave it there for this paper!
>8) FTP pg 12
>
>Last sentence replace "it" with "this is"
>
>Purely grammatical.
Ta.
>9) News - pg 12
>
>This whole section needs rewriting. It describes a URN - i.e. the
>messageid. This is location and time independant. We do need
something
>here, its a URL that can be used with NNTP, which I believe requires
>article numbers i.e. a news url should look something like.
>
> nntp:path.net/comp/infosystems/gopher/3456
No no no.
It is true that the news: URL is location independent.
It is however the basic access address for the NNTP
protocol (except broken versions).
We can't consider news: as URNs because the NNTP
protocol doesn't scale for persistent objects.
It relies on the fact that a lot expires
before you get the next 30MB. See my message of a few
days ago, and Larry Masinter's. Basically, we need
news: references, we call them URLs because they are
not URNs and we are *NOT* going to invent annother URX!
The URL form you suggest is not appropriate, in that
it can only be used by people served by a particular
news host. The 3456 article number differs on other
hosts. To use NNTP hosts globally in this way is
to abuse the NNTP architecture. If I refer to a
news article or group, the person I give eth reference
to should go to her _local_ NNTP server for the article.
If you want a central server, use HTTP or FTP not NNTP.
>10) Wais - pg 12
>
>A client does not need to know the length to retrieve an object, the
bytes
>to be retrieved may (but are not neccessarily) encoded in docid. The
type is
>carried seperately, and is required for retrieval since a docid can
refer
>to a number of seperate objects with different types.
You are right in that since the paper was written, wais
source has been fixed to wrok without a length specified.
I guess we have to keep the field there for
back-compatibility.
I am not aware (perhaps ignorance) of a way of deriving
the list of types available from the wais docid. The set
of types is returned only with the serach result, and
so must be regarded as part of the URL.
Maybe (Simon?) z39.50'' will be clean in both these regards,
and we will be able to use the docid neat. In that case, I
would introduce a z39.50: URL.
>11) Wais - pg 13
>Change "not of course not need" to "not of course need"
Thanks.
>12) Prospero - pg 13
>
>Change "feilds" to "fields"
Thanks.
>13) Prospero - pg 14
>
>I dont think the stuff about %00 and attributes goes here, it
belongs in
>the URC
I put in what Cliff wanted almost verbatim.
It is up to Cliff I think.
>14) Prospero - pg 14
>
>The comment about External Prospero links applies equally well to
Gopher,
>and should be deleted here, or added to the Gopher entry.
Added. Actually there was a sentence in there but the
phrasing was different so now they look the same too.
>15) Gopher - pg 14
>
>This entry only works for Gopher0 not Gopher+. A gopher URL must
>distinguish between G+ and G0 because clients will break if they ask
for
>G+ and get G0.
Really? You mean we need "gopher+:"
Should this be the subject of further study?
>16) Gopher - pg 14
>
>In Gopher+ a type is required for retrieval.
>The type character is not required for retrieval in G0. It may be
present
>in the path but need not be. It belongs in the URC.
>Note WWW incorrectly included the type in their URL which probably
gives
>the historical reason for this definition.
Explain to me how you can retrieve a Gopher0 object
with knowledge of the selector string but not of the
type.
>17) BNF - pg 15
>
>The following changes follow from the points above, lets leave the
details
>until we agree on which of the changes above belong in the spec.
In the following "No" means "See discussion above,
I have not actually changed the spec here."
>delete entries for "fragmentaddress" and "fragmentid" (see 3)
No.
>change "fileaddress" to "ftpaddress"
Yes
>delete "newsaddress" see 9
No
>waisdoc doesnt need "digits/" (see 10)
No.
>prosperolink probably needs changing (see 13)
Ask Cliff
>telnetaddress (or rather "user") needs password, we deprecate it
>(see "Internet protocol parts" on pg 9), but it is allowed.
Agreed. Same applies to FTP.
>gopheraddress shouldnt have "/gtype" (see 16)
>gtype can be deleted (see 16)
No.
>extra should, I believe, include "#" (see 3)
>
>variant and punctation should be deleted, they arent referred to
anywhere,
>and variant in particular is a term used in URI parlance for
something else.
I called it now "national" -- the national variant characters
and the punctuation characters are the ones excluded.
I leave them there as a note (now explained in the text)
that they are excluded.
Did we in the end exclude the "national" (variant)
characters? They were considered dangereous, but did
we not in fact allow them?
>A note should be added that the definition of path allows it to end
in
>a / - I think this is intentional, but its non obvious.
I think the only impact is on resolution of the partial
form -- I have put a note there.
>18) References.
>
>Alberti... Change 00 to 0 (see 16)
> All occurances of %20 should be changed to a space (see 2)
Done. I hope people can still use them :-)
>Berners-Lee...
> Delete the "." after ch, a hostname cant end in . according
to the BNF
Hmmm... or change the BNF? A DNS name with a trailing slash
is not common practice but is valid,. means that the
last domain is a top level one.
I'll leave the BNF as there is enough problem with two URLs
looking different and being the same.
>
================================================================
These new versions are in printable RFC form on
<ftp://info.cern.ch/pub/www/doc/url7.ps> and
<ftp://info.cern.ch/pub/www/doc/url7.txt>
The pagination of the new versions is changed
in that they are generated from HTML rather than Word.
If you had problems with Word's postscript then you might
find the new version from dvips better.
I would like to get this solid for RFC release without
any action in Houston as I won't be there. (This mailing list
is the real WG!)
Tim