Message-Id: <199601311115.MAA11079@dkuug.dk>
From: keld@dkuug.dk (Keld J|rn Simonsen)
Date: Wed, 31 Jan 1996 12:15:39 +0100
In-Reply-To: Larry Masinter <masinter@parc.xerox.com>
To: Larry Masinter <masinter@parc.xerox.com>, borka@e5.ijs.si
Subject: Re: html, http, urls and internationalisation
Larry Masinter writes:
> > What Keld said is sound and could be worked further. THe major
> > restriction is the DNS part and this should be kept as it is
> > (character < 127). The same applies to the syntax characters.
>
> No, "what Keld said" isn't "sound" it is just "sounds nice".
Glad you like the sound effects, Larry!
> Keld said, for example,
>
> > 1. URLs themselves.
>
> > These are at an abstract character level, as Larry and Franc,ois
> > correctly points out, you cannot see what is the charset
> > when you look at a business card or an URL in the newspaper.
>
> > I propose that any character here be allowed, except for the
> > URL syntax characters, (things like < / : ) - in the non-DNS
> > part of the URL. Remember these are abstract characters, and
> > there is no binding to for example ISO 10646 in the sense
> > of a character repertoire, or to any encoding (charset).
>
> However, this nice-sounding proposal contained no solution to the
> following questions:
>
> 1)how do these abstract characters subsequently get turned
> into octets that are employed in real protocols in general
> and http and ftp in particular?
> (The current URL specification gives an algorithm.)
>From glyphs on paper to a computer system, eg. a browser:
by having the human recognise (aka "read") the characters and enter
them, as is normally done.
>From a html doc into a http request: The html doc has a
charset, and the http request url is represented in a charset.
So the html string with the URL is converted into the http
charset, and then the URL is sent with high bits encoded according
to the url specifications (in %xx notation). I found no ways
of specifying a charset in the current rfcs on URLs.
I did specify the transformations and encodings in earlier mail.
>
> 2)how does one translate a URL that uses a large character
> repertoire so that it might be written in a context with
> a small repertoire? E.g., a URL with chinese characters
> in an ASCII email message.
> (The current URL specification manages this by limiting
> the repertoire.)
That was also described in the previous mailing, about the html I said:
> >Here it should be possible to write a HTML document in a given
> >charset, and then reference the (abstract) characters in the URL, just
> >like it is possible to write characters in the rest of the HTML document.
> >That is, the normal characters of the document charset can be used,
> >like full iso-8859-1 in normal HTML docs, and full Unicode in
> >Unicode docs. Also the way of generating out-of-band characters
> >should be allowed in HTML URL strings, like &a-ring and &#xxxx;
> I don't think these problems are unsolvable, but I think in the course
> of making a "sound" proposal you'll find that it starts "sounding"
> less and less like something that you'd want to implement.
I think most of the concerns have been addressed in what I wrote,
but anyway there may be finer details in it that needs to be sharpened
and and it needs to be cast in concrete specs.
I think most of the specs are already there and ready to be employed
in an implementation.
> So, I'll ask again, PLEASE stop cross-posting this discussion to three
> separate mailing lists.
OK, taken ad notam.
Keld