Re: (LONG) Detailed notes on URL6.TXT

Mitra (mitra@path.net)
Wed, 13 Oct 1993 11:54:44 PDT

From: mitra@path.net (Mitra)
Date: Wed, 13 Oct 1993 11:54:44 PDT
In-Reply-To: Tim Berners-Lee <timbl@www3.cern.ch>
To: timbl@nxoc01.cern.ch, mitra@path.net
Subject: Re: (LONG) Detailed notes on URL6.TXT
Message-Id: <9310131154.aa06645@pandora.sf.ca.us>

On Oct 13, 4:04am, Tim Berners-Lee wrote:
} Mitra wrote:
} I have taken some poinst and incorporated directly changes into
} the spec on line -- read the hypertext version

Tim - Can you generate a context diff "diff -c" between the latest version
and url6.txt so we can all go over it BEFORE Houston, at this stage in
the process, I think that we could possibly approve a final version
at Houston, but only if we get to go over the changes online before then.

} >3) Fragment Id - pg 7
} >
} >Delete entire section
} >
} >I thought Fragment id's had been removed from URL's, and that # was
} >restored as a valid character inside a URL. Other apps than WWW use #
} >for other purposes e.g. its a valid character in a filename.
} >
} >WWW - I think this causes WWW problems.
} Had they been removed? I thought that they were there as
} a hook for future work.
}
} WWW certainly uses them and to change that would
} cause problems. I'd like to keep them in.
} Many systems need or use some form of fragment ID.
}
My understanding was that Fragment Id's and Types got moved over to the
URC, since many apps need types and fragment id's to do retrieval, this
makes the need for a way to wrap a URL with a Fragment and Type urgent.

}
} >4) Path - pg 9
} >Replace "must" with "should"
} >This is a function of the application, in most apps "/" has
} > hierarchical
} >meaning, in others its part of the valid character set with no
} >syntatical meaning.
}
} No, I think that the "/" was mandated. If someone
} uses "/" for something else, then it should be %2F'd.
} Tough, but useful. Hierarchy is SO common that
} it is very useful to be able to see a relationship.
} For example, I might want to cache xxtp://friedwhoop/gh/*.

Agreed it is usefull, but it is not a requirement that non-hierarchical
schemes add any structure inside their strings - these are opaque
strings, except where there structure is specified by the scheme. For
example the docid portion of a Wais URL is going to contain slashes, and
these have no hierrarchical meaning because they are part of something
that isnt seperated from the rest by slashes. In your example above,
you'd be nuts to cache xxtp://friedwhoop/gh/* if you didnt know (by the
definition of the xxtp: scheme) that //friedwhoop/gh was a sensible
level of a hierarchy to cache.

}
} >6) Encoding prohibited characters pg 11
} >Delete last paragraph "The same considerations...URL"
} >See notes above on fragments.
} See objection above.
See comment on objection (item 3)
}
} >9) News - pg 12
} >This whole section needs rewriting. It describes a URN - i.e. the
} >messageid. This is location and time independant. We do need something
} >here, its a URL that can be used with NNTP, which I believe requires
} >article numbers i.e. a news url should look something like.
} > nntp:path.net/comp/infosystems/gopher/3456
} No no no.
} It is true that the news: URL is location independent.
} It is however the basic access address for the NNTP
} protocol (except broken versions).

Hmm - we use the standard unix nntp distribution (not INN) and I do not
think it can return you an article in response to a message-id. It
doesnt have a table that can do this. WWW wont support access this way.
On the other hand, I can retrieve articles by number from gopher (via
the go4gw gateways).

} We can't consider news: as URNs because the NNTP
} protocol doesn't scale for persistent objects.
} It relies on the fact that a lot expires
} before you get the next 30MB. See my message of a few
} days ago, and Larry Masinter's. Basically, we need
} news: references, we call them URLs because they are
} not URNs and we are *NOT* going to invent annother URX!

news:xxx@yy.zz are URNs - they are a persistent, location independant
NAME for an object, they do not specify where that object can be found,
nor even if it still exists anywhere.

} The URL form you suggest is not appropriate, in that
} it can only be used by people served by a particular
} news host. The 3456 article number differs on other
} hosts. To use NNTP hosts globally in this way is
} to abuse the NNTP architecture. If I refer to a
} news article or group, the person I give eth reference
} to should go to her _local_ NNTP server for the article.
} If you want a central server, use HTTP or FTP not NNTP.
}
Agreed - the URL is a dumb way to refer someone to a news article, but
that doesnt mean it isnt a URL. The news URL is the extreme case of the
problems we have discussed with URL's i.e. the object can be deleted or
move or be inaccessible from the location you are in. If you want to
point someone at a news article, then give them a URN, we could define
urn:news/1234@xxx.yyy as one of our URN schemes.
}
} >10) Wais - pg 12
} >A client does not need to know the length to retrieve an object, the bytes
} >to be retrieved may (but are not neccessarily) encoded in docid. The type is
} >carried seperately, and is required for retrieval since a docid can refer
} >to a number of seperate objects with different types.
} You are right in that since the paper was written, wais
} source has been fixed to wrok without a length specified.
} I guess we have to keep the field there for
} back-compatibility.

Back compatibility with what?

} I am not aware (perhaps ignorance) of a way of deriving
} the list of types available from the wais docid. The set
} of types is returned only with the serach result, and
} so must be regarded as part of the URL.

Thats what I said - we need the type, not the length
}
} Maybe (Simon?) z39.50'' will be clean in both these regards,
} and we will be able to use the docid neat. In that case, I
} would introduce a z39.50: URL.
}
Agreed - the current wais docid contains the concept of both a URN and a
URL, but due to limitations in all the current implimentations
(wais-8-b5, freewais and wais-inc) it must be kept as an opaque string
to be passed back to the wais application.

} >13) Prospero - pg 14
} >
} >I dont think the stuff about %00 and attributes goes here, it
} belongs in
} >the URC
}
} I put in what Cliff wanted almost verbatim.
} It is up to Cliff I think.

Agreed - Cliff are you there ????
}
} >15) Gopher - pg 14
} >This entry only works for Gopher0 not Gopher+. A gopher URL must
} >distinguish between G+ and G0 because clients will break if they ask for
} >G+ and get G0.
} Really? You mean we need "gopher+:"
} Should this be the subject of further study?
}
Yes - definately, note that WWW cant access G+, and can only access G+
servers, if and only if all the features are available through Gopher's
backward compatibility.
}
} >16) Gopher - pg 14
} >In Gopher+ a type is required for retrieval.
} >The type character is not required for retrieval in G0. It may be present
} >in the path but need not be. It belongs in the URC.
} >Note WWW incorrectly included the type in their URL which probably gives
} >the historical reason for this definition.
}
} Explain to me how you can retrieve a Gopher0 object
} with knowledge of the selector string but not of the
} type.

The selector string is what is needed for retrieval. As with most URL's
the client wont know what to do with the object without the type, but we
took types out of the URL and put them in the URC.

} >17) BNF - pg 15
} >
} >The following changes follow from the points above, lets leave the
} details
} >until we agree on which of the changes above belong in the spec.
}
} In the following "No" means "See discussion above,
} I have not actually changed the spec here."
}
} >delete entries for "fragmentaddress" and "fragmentid" (see 3)
} No.
}
} >delete "newsaddress" see 9
} No
}
} >waisdoc doesnt need "digits/" (see 10)
} No.
}
} >prosperolink probably needs changing (see 13)
} Ask Cliff
}
} >gopheraddress shouldnt have "/gtype" (see 16)
} >gtype can be deleted (see 16)
} No.
}
} >extra should, I believe, include "#" (see 3)
}
} >
} >variant and punctation should be deleted, they arent referred to anywhere,
} >and variant in particular is a term used in URI parlance for
} } something else.
}
} I called it now "national" -- the national variant characters
} and the punctuation characters are the ones excluded.
} I leave them there as a note (now explained in the text)
} that they are excluded.
}
} Did we in the end exclude the "national" (variant)
} characters? They were considered dangereous, but did
} we not in fact allow them?
}
The conclusion was to remove national characters, there was much
discussion, and everyone saw the merits of both sides of this
discussion.

} >Berners-Lee...
} > Delete the "." after ch, a hostname cant end in . according
} to the BNF
}
} Hmmm... or change the BNF? A DNS name with a trailing slash
} is not common practice but is valid,. means that the
} last domain is a top level one.
}
} I'll leave the BNF as there is enough problem with two URLs
} looking different and being the same.

There was another message about this, I dont think the choice is
important, but we do need to make the choice "Can DNS names end in ."
}
} These new versions are in printable RFC form on
} <ftp://info.cern.ch/pub/www/doc/url7.ps> and
} <ftp://info.cern.ch/pub/www/doc/url7.txt>
}
Great

} The pagination of the new versions is changed
} in that they are generated from HTML rather than Word.

Tim - that is going to a be a pain, since it means we cant generate a
diff. Could you generate a newly paginated form of url6.txt, and diff
against that.

} I would like to get this solid for RFC release without
} any action in Houston as I won't be there. (This mailing list
} is the real WG!)

Tim - I think this message still contains all the open points. I've
marked off all the changes that you've ok-ed, which, if their are no
further objections on the list, should be considered as made. I'd like
to have a RFC ready for final approval in Houston, i.e. finalize details
here, but with a final chance at Houston. I know that many people with
an active interest in this dont have time to follow all the detailed
discussion.

- Mitra