Re: Wrappers for URLs

Keith Moore (moore@cs.utk.edu)
Fri, 07 May 1993 21:45:02 -0400

Message-Id: <9305080145.AA05811@wilma.cs.utk.edu>
From: Keith Moore <moore@cs.utk.edu>
To: timbl@nxoc01.cern.ch
Subject: Re: Wrappers for URLs
In-Reply-To: Your message of "Thu, 06 May 1993 18:50:24 BST."
<9305061750.AA18269@www3.cern.ch>
Date: Fri, 07 May 1993 21:45:02 -0400

To: jak@violet.berkeley.edu (John A. Kunze)
Subject: Re: Wrappers for URLs
Date: Thu, 6 May 93 18:50:24 +0100

> How about just suggesting that in plain text when other conventions
> are not applicable, the URL be wrapped in say <>'s especially if
> it has to break lines. (Otherwise it can be wrapped in blanks!)

This seems like a reasonable compromise.

How about something like:

A URL may be represented in either of two formats. Exchange format is the
format recommended for use in communications protocols between programs that
use URLs. Print format allows URLs to be represented in media that have
limitations on line length. Print format is recommended for representing
URLs in printed form, and also in ordinary text files.

1. A URL in "exchange format" is written entirely in printable, non-space
ASCII characters with octets from the range from 21 to 7E hex, inclusive.

(a) Any octet may be represented as '%' followed by two upper case hex
digits.

(b) Octets in the "safe" set { 21-24 hex, and 26-3B hex, 3D hex, and 3F-7E
hex } may be encoded as the corresponding ASCII characters.

(c) Octets outside the "safe" set MUST be represented in hex according to
rule (a).

In order to compare two URLs to see if they indicate the same location,
both exchange format URLs must first be "decoded" by replacing all %XX
patterns with the corresponding octet.

3. An exchange format URL may be converted to "print format" by enclosing the
URL with '<' and '>'. White space characters and line-breaks may appear
in a print format URL, but these are entirely non-significant. To convert
a print format URL to exchange format, remove the enclosing '<' and '>'
characters and delete any internal white-space and line-breaks. Programs
that accept URLs as input from humans should accept URLs in exchange
format and convert them internally to exchange or internal format as
necessary.

A URL in exchange format never begins with '<'.

Keith