Re: response to your 'last call' comments

Olle Jarnefors (ojarnef@admin.kth.se)
Mon, 17 Oct 94 13:51:13 +0100

Date: Mon, 17 Oct 94 13:51:13 +0100
Message-Id: <9410171251.AA04936@mercutio.admin.kth.se>
From: Olle Jarnefors <ojarnef@admin.kth.se>, Peter Svanberg <psv@nada.kth.se>
To: Larry Masinter <masinter@parc.xerox.com>
In-Reply-To: <94Oct10.011146pdt.2760@golden.parc.xerox.com>
Subject: Re: response to your 'last call' comments

We are generally happy with your changes to the draft. Only a
few remaining points.

(References to the preliminary draft of October 9, 1994. Quotes
from it start with ": ", quotes from your response to us start
with "> ". Lines starting with "/ " are suggested replacement
text, those staring with "+ " are suggested text additions.)

: 1. Introduction
:
: This document describes the syntax for a compact string
: representation for a resource available via the Internet. These

Add "and semantics" after "syntax".

: 2. Definitions
:
: Just as there are many different methods of access to resources,
: there are several _schemes_ for describing the location of such
: resources.
:
: The generic syntax for URLs provides a framework for new schemes to
: be established using protocols other than those defined in this
: document.
:
: URLs are used to `locate' resources, by providing an abstract
: identification of the resource location. Having located a
: resource, a system may perform a variety of operations on the
: resource, as might be characterized by such words as `access',
: `update', `replace', `find attributes'. In general, only the
: `access' method needs to be specified for any URL scheme.
:
: 2.1. URL SYNTAX
:
: A full BNF description of the URL syntax is given in Section 5.

The text in section 2 is not definitions of terms in the strict
sense. The text between the headings "2. Definitions" and
"2.1. URL SYNTAX" is of an introductory nature and only
touches the syntax of URLs, which is the subject of the rest of
section 2. We therefore suggest

-- that the heading "2. Definitions" is removed

-- that a new heading, "2. Common URL syntax", is inserted
immediately before the present heading "2.1. URL SYNTAX"

-- that the latter heading is changed to "2.1. The main parts of URLs"

Section 2.2 contains material that extends beyond what's implied
by its present heading. We suggest

-- that the heading is changed to "2.2. Character encoding issues"

> $$$ Secton 2.2 was one of the most difficult to deal with in the URL
> $$$ document. I've taken many of your suggestions, but not all of them.
> $$$ Please review the rewrite and insure that it satisfies your desire for
> $$$ accuracy and completeness.

The new text is a big improvement. We would like to suggest a
few further clarifications to the text, since this is a tricky
matter, especially for readers of the specification not familiar
with computing milieus with several concurrently occuring coded
character sets (which are common outside the part of the world
where the English language is predominant).

: In most URL schemes, different parts of a URL are used to represent
: sequences of octets used in Internet protocols.

Change this to:

/ In most URL schemes, the sequences of _characters_ in
/ different parts of a URL are used to represent
/ sequences of _octets_ used in Internet protocols. For example, in the

Next sentence:

: For example, in the
: ftp scheme, the host name, directory name and file names are
: represented by parts of the URL.

Make this even clearer by stating what kind of objects these FTP
names are:

/ For example, in the
/ ftp scheme, the host name, directory name and file names are
/ such sequences of octets,
/ represented by parts of the URL.

Last sentence of the second paragraph of section 2.2:

: Within those parts, chararacters
: are generally used to represent the corresponding octet within the
: US-ASCII [20] coded character set.

We think that a wording somewhat easier to grasp would be:

/ Within those parts, a chararacter is, with some exceptions,
/ used to represent the octet by which it is itself
/ represented within the US-ASCII [20] coded character set.

To make these principles more concrete, we would also suggest
that the following illustrative example is added at the end of
section 2.2.

+ An example may clarify the different representations
+ involved in the interplay between URL and underlying access
+ protocol:
+
+ A Macintosh file with a name consisting of the five letters
+ r<o-circumflex>les
+ will, according to the _Macintosh_ coded character set, be
+ represented by the five octets (in hexadecimal notation)
+ 72 99 6C 65 73
+ and these will also be used in the FTP control transactions
+ when accessing the file.
+
+ The part of the ftp URL corresponding to this file name may be
+ r%99les
+ where the first and three last characters have the octets
+ given above as their coded representations in _US-ASCII_,
+ and the second octet is encoded by "%99".
+
+ If this URL is included in a text file on e.g. a computer
+ system using an _EBCDIC_ coded character set, the
+ characters of this part of it will be represented by the
+ octets
+ 99 6C F9 F9 93 85 A2
+ in that file.

The third paragraph of section 2.2:

: In addition, octets may be _encoded_ by a character triplet
: consisting of the character "%" followed by two hexadecimal digits
: (from "0123456789ABCDEF"), which forming the hexadecimal value of
: the octet.

Add here:

+ Lowercase letters "abcdef" may be used, although this is
+ not recommended.

In the part

: No corresponding graphic US-ASCII:
:
: URLs are written only with the graphic printable characters of the
: US-ASCII coded character set. All octets that correspond to
: non-printable characters or space must be encoded.

a more precise wording may be used:

/ No corresponding graphic US-ASCII:
/
/ URLs are written only with the graphic printable characters of the
/ US-ASCII coded character set. Therefore, the octets 80-FF
/ hexadecimal that are not used in US-ASCII, and the octets
/ 00-1F and 7F hexadecimal, that represent control characters
/ in US-ASCII, must be encoded.

The SPACE character we suggest is treated as an "unsafe"
character in the next category:

: Characters can be unsafe for a number of reasons.

Add here:

+ The space character is unsafe because significant spaces
+ may disappear and insignificant spaces may be introduced
+ when URLs are transcribed or typeset or subjected to the
+ treatment of word-processing programs.

Last suggestion for section 2.2:

: In all URLs, irrespective of scheme, only alphanumerics, reserved
: characters used for their reserved purposes, "$", "-", "_", ".",
: "!", "*", "'", "(", ")", "," and "+" may be used unencoded.

This sentence will be clearer if written the other way around:

/ Only alphanumerics, reserved
/ characters used for their reserved purposes, "$", "-", "_", ".",
/ "!", "*", "'", "(", ")", "," and "+" may be used unencoded in all
/ URLs, irrespective of scheme.

> Suggestion: Change the end of the last sentence of the above
> quote to:
> + and ">"; it takes the form <unique>@<full_domain_name>,
> + even if some small deviations from the RFC 1036 syntax
> + rules occur in practice.
>
> See also comment # 36.
>
> $$$ I don't believe this is reasonable, actually. While variation may
> $$$ occur in practice, I don't think the URL standard can actually be
> $$$ flexible on this point; the uniqueness of the identifier relies on
> $$$ the 'unique' part being relative to a unique FQDN. If two hosts
> $$$ generate A0001@uucp, you'll have duplication.

Accepted.

This makes the new BNF definition of "article"

: article = 1*articlechar "@" 1*articlechar
: articlechar = uchar | ";" | "/" | "?" | ":" | "&" | "="

unnecessary. It can be restored to:

/ article = 1*[ uchar | ";" | "/" | "?" | ":" | "&" | "=" ] "@" host

> Suggestion: Change the example to:
>
> + Yes, Jim, I found it under <URL:ftp://info.cern.ch/pub/www/doc;
> + type=d> but you can probably pick it up from <URL:ftp://ds.in-
> + ternic.net/rfc>. Note the warning in <URL:http://ds.internic.
> + net/instructions/overview.html#WARNING>.
>
> $$$ your revised example is more interesting, but is being labelled as
> $$$ 'incorrect'; I'd rather not introduce it.

OK.

--
Olle Jarnefors, Royal Institute of Technology, Stockholm <ojarnef@admin.kth.se>

--
Peter Svanberg, Royal Institute of Technology, Stockholm <psv@nada.kth.se>