Date: Wed, 23 Mar 94 18:58:05 +0100
From: Tim Berners-Lee <timbl@ptpc00.cern.ch>
Message-Id: <9403231758.AA14263@ptpc00.cern.ch>
To: "Mark P. McCahill" <mpm@boombox.micro.umn.edu>
Subject: Re: how to make progress on the URL document
Mitra and Mark, you ask for diffs. You're not going to like them
because the formatting messes it up quiet a lot but for what it's worth
here it is.
Tim
diff url-spec.txt /pub/www/doc/draft-uri-url-02.txt
2,3c2,3
< draft-ietf-uri-url-03.{ps,txt} URI working Group
< Expires 21 September 1994 21 March 1994
---
> draft-ietf-uri-url-02.{ps,txt} CERN
> Expires 1 July 1994 1 Jan 1994
8,9c8,9
< A Syntax for the Expression of
< Access Information of Objects on the Network
---
> A Unifying Syntax for the Expression of
> Names and Addresses of Objects on the Network
12,23c12
< ABOUT THIS DOCUMENT
<
< This document specifies a Uniform Resource Locator (URL), the
< syntax and semantics of formalized information for location and
< access of resources on the Internet.
<
< This document was written by the URI working group of the Internet
< Engineering Task Force. Comments may be addressed to the editor,
< Tim Berners-Lee <timbl@info.cern.ch>, or to the URI-WG
< <uri@bunyip.com>. Discussions of the group are archived at
<
< <http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>
---
> Status of this memo
25,41d13
< This document is bound by the Requirements Specification in
< preparation.
<
< The work is derived from concepts introduced by the World-Wide Web
< global information initiative, whose use of such objects dates
< from 1990 and is described in "Universal Resource identifeirs for
< the World-Wide Web", RFCXXX.
<
< This document is available in hypertext form, with links to
< background information, as:
<
< <http://info.cern.ch/hypertext/WWW/Addressing/URL/Overview.html>
<
< .
<
< STATUS OF THIS MEMO
<
53c25,29
< Distribution of this document is unlimited.
---
> Distribution of this document is unlimited. Please send comments
> to the author as timbl@info.cern.ch. or to the discussion list
> ietf-url@merit.edu.
>
> Abstract
54a31,53
> Many protocols and systems for document search and retrieval are
> currently in use, and many more protocols or refinements of
> existing protocols are to be expected in a field whose expansion is
> explosive.
>
> These systems are aiming to achieve global search and readership of
> documents across differing computing platforms, and despite a
> plethora of protocols and data formats. As protocols evolve,
> gateways can allow global access to remain possible. As data
> formats evolve, format conversion programs can preserve global
> access. There is one area, however, in which it is impractical to
> make conversions, and that is in the names and addresses used to
> identify objects. This is because names and addresses of objects
> are passed on in so many ways, from the backs of envelopes to
> hypertext objects, and may have a long life.
>
> A common feature of almost all the data models of past and proposed
> systems is something whicch can be mapped onto a concept of "object"
> and some kind of name, address, or identifier for that object. One
> can therefore define a set of name spaces in which these objects
> can be said to exist.
>
> Practical systems need to access and mix objects which are part of
56a56
>
58a59,467
> different existing and proposed systems.
>
> This paper discusses the requirements on a universal syntax which
> can be used to encapsulate a name in any registered name space.
> This will allow names in different spaces to be treated in a common
> way, even though names in different spaces have differing
> characteristics, as do the objects to which they refer
>
> The universal syntax to objects available using existing protocols,
> and may be extended with technology. It makes a recommendation for
> a generic syntax, and for specific forms for "Uniform Resource
> Locators" (URLs)of objects accessible using existing Internet
> protocols.
>
> The syntax has been in widespread use by World-Wide Web software
> since 1990.
>
> Terms
>
> The objects on the network which are to be named and addressed
> include typically objects which can be retrieved, and objects which
> can be searched. There is a great variety of other objects which
> may support other operations. We imply nothing about the contents
> of objects in this document. Whereas human-readable documents are
> currently the center of interest of the field, we envisage all
> aspects discussed in this paper applying to generalized objects
> when systems to handle them become available. The "object" is the
> unit of reference and need not correspond to any unit of storage.
> We refer to objects which can be searched as "indexes". We
> emphasize that this is the abstract view of the client, and these
> objects need not correspond to physical files on computers. We
> refer to the person who does the retrieval or searchiing as the
> user.
>
> Within this document, we use the terms "name" very generally for a
> string of characters describing an object, whatever its
> combination of properties mentioned below. (The term usually has a
> narrower meaning but we needed some term for the universal set.).
> This uniform syntax applied to a generic name is known as a Uniform
> Resource Identifier (URI). The term "address" is reserved for an
> string which specifies a more or less physical location. The term
> "locator" refers to a URL as here defined. URIs which have a
> greater persistence than URLs are referred to as URNs.
>
> Characteristics
>
> This section characteristics of various naming schemes,
> requirements which some ofexisting schemes meet, and requirements
> for the URL scheme itself. URLs, as an introduction of and
> background for the Recommendations section.
>
> USES OF NAMES AND ADDRESSES
>
>
>
>
> Berners-Lee 2
>
> A name allows a user, with the help of a "client" program, to
> retrieve or operate on objects via a "server" program. A name may
> be passed for example:
>
> In communication of any form between two people, to refer to a
> document, or part of a document;
>
> As part of the description of a link associated with a hypertext
> document;
>
> As part of the result of searching an index.
>
> Some typical requirements on a name which are met to a varying
> degree by various schemes are for example that the name is
>
> Persistent A given name will remain valid as long as it
> is needed;
>
> Extensible A given naming syntax will remain valid
> through the introduction of new protocols and
> directory technologies;
>
> Resolvable A name will contain enough information to
> allow the document or index to which it
> refers to be accessed, perhaps via resolution
> into an intermediate, more physical, name.
>
> Unique Each object can only have one such name.
> The fact that two such names are different
> implies that the objects to which they refer
> are different (in some way).
>
> Unambiguous The fact that two names are identical
> implies that the objects named are the same
> (in some way).
>
> The syntax discussed is the syntax of one name, be it a lasting
> name or a physical address. When a directory server or hypertext
> link contains a set of alternative names, then that is beyond the
> scope of this syntax. Similarly, a syntax for describing a
> compound object is outside the scope of this syntax. The specific
> locator name spaces (defined under the umbrella of the general
> syntax) each meet the requirements above to a greater or lesser
> extent.
>
> CURRENT PRACTICE
>
> Current protocols use many different standards for names. For some
> protocols, such as ISO-10163 Search and Retrieve protocol[16], the
> names returned in a search are only valid during the session. For
> others, such as FTP[9], they are lasting names which may be used
> for object retrieval at a later time. Typically, however, they are
> not long-lasting names which are independent of the location of the
>
>
>
> Berners-Lee 3
>
> object. Such names may be provided using directory servers such as
> x.500. They will refer to the registration, however formal or
> informal, of a object with a particular organisation or person.
> Both hypertext and manual references rely on long- lasting names.
> Current names are basically location specifiers (addresses). These
> may be known as Uniform Resource Locators (URLs). They give the
> necessary parts of an address for a reader to access an information
> provider using the given protocol, and ask for the object required.
> Examples of names used by various protocols include
>
> File Transfer Protocol (Postel 1985):
>
> Host name or IP-address
>
> [TCP port]
>
> [user name, password]
>
> Filename
>
> W.A.I.S. (Kahle 1990)
>
> Host name or IP-address
>
> [TCP port]
>
> local document id
>
> Gopher (Alberti 1991)
>
> Host name or IP-address
>
> [TCP port]
>
> database name
>
> selector string
>
> HTTP (Berners-Lee 1991)
>
> Host name or IP-address
>
> [TCP port]
>
> local object id
>
> NNTP (Kantor 1986)
>
> NNTP group
>
> Group name
>
> NNTP article
>
>
>
> Berners-Lee 4
>
> Host name
>
> unique message identifier
>
> Prospero links (Neuman 1992)
>
> Host name or IP address
>
> [UDP port]
>
> Host specific object name
>
> [version]
>
> [identifier]*
>
> x.500 distinguished name
>
> Country
>
> Organisation
>
> Organisational unit
>
> Person
>
> Local object identifier
>
> Other systems with their own naming schemes include BITNET
> "LISTSERV" application, FTAM file retrieval, SQLnetTM remote
> database search, proprietary distributed file systems, etc.
> Conventional syntax for writing these addresses involve various
> forms of punctuation to separate these parts. This sometimes, but
> not always, allows the naming scheme to be deduced from the
> punctuation. For example, a name of the form
> xxx.yyy.zz.edu:/pub.aa.bb.cc often implies anonymous FTP access.
> However, there is no well-defined algorithm for parsing an
> arbitrary name, as there is no common syntax.
>
> EXPANDABILITY
>
> There will necessarily be a phase during which lasting names will
> become more common, as the deployment of directory services
> increases to the point where every user has direct or indirect
> access to one. Even then, however, one can envisage more than one
> competing directory system, and cases in which physical names are
> still required. A directory service takes a lasting name and
> reduces it to a physical address (or set of addresses) which,
> though less useful for lasting reference, is the only way to
> actually retrieve the object. An addressing syntax is required
> which will be able to encompass existing physical address spaces,
> and be extendible to any future protocols. This requires that it
> contain an identifier for the protocol in use. The format of the
>
>
>
> Berners-Lee 5
>
> rest of the address will necessarily depend to a certain extent on
> the protocol.
>
> RELEVANCE
>
> The life of a name is limited by any information contained within
> it which may become prematurely invalid. It is therefore necessary
> to limit the contents of a name to the information required for the
> operations above. Other extraneous information about the object
> (its size, data format, authorisation details, etc.) may in general
> change with time and should not be part of the name. One might
> expect such information to be part of the "header" of a object, and
> for protocols to allow the header information to be retrieved
> independently of the objects themselves. Any physical address may
> be subject to change with time: hence we encourage the move to
> lasting names and directory services.
>
> UNIQUENESS
>
> Clearly one requires unambiguous names in the sense that one name
> should refer to only one logical object. This is the case with all
> the addressing schemes in use, whether they are directory systems
> or physical addresses. (The internet addresses all rely on the
> domain name (Mockapetris 1987) of the host to achieve this).
> However, given that names can be translated, many apparently
> different names may lead to the same object. Any object may
> therefore be referred to by many names. One needs to be able to
> know whether two objects, retrieved through different paths, are
> in fact the same object. It is suggested that each object have a
> unique "official" name. This name could be stored in the object in
> some representations, or stored in a database accessible to the
> server, for example. Any references within that object should be
> parsed in the context of the official name. In the presence of a
> directory service, the official name will normally be the
> registered name of the object. However, a name in any scheme will
> do, so long as it is completely specified. On systems which do not
> allow the name to be stored (such as anonymous FTP archive sites),
> a possible ambiguity will always exist as to whether two similarly
> named objects are in fact the same. Note that Internet newsgroup
> names are unique world-wide, and news articles carry a unique
> message id. In most other cases, however, there is no guarantee
> that dereferencing a URL will work, or that if it does the object
> it refers to will in fact be the object intended. URLs such as FTP
> addresses are transient in that files may be moved and even
> replaced by different files of the same name. This disorganisation
> may be limited by good server management, but a naming scheme which
> is independent also of internet host name is obviously preferable.
>
> READABILITY BY PEOPLE
>
> This requirement has been put forward by several people (Clifford
> Lynch, Douglas Engelbart among others), and disputed by others.
> The author's view is that it will be a while before technology and
>
>
>
> Berners-Lee 6
>
> standardisation have reached the point at which names and addresses
> will be hidden from human beings. As long as they must be written
> on the backs of envelopes and "cut and pasted" between workstation
> windows, there is a strong need for names to be
>
> Short
>
> Composed of printable (preferably non-white) characters
>
> To a certain extent, understadable by a human being.
>
> STRUCTURE OF NAMES AND ADDRESSES
>
> A physical address is required in order for:
>
> The user's program to contact the server;
>
> The server to perform the operation (e.g. search and index,
> retrieve a object, or look up the name) and return a result;
>
> The user's program to locate an individual position or element
> within a returned object.
>
> This suggests that a name be structured, such that the parts
> necessary for these three operations be separate and only used by
> those system elements which need those parts. This corresponds to
> the basic principle of information hiding. In fact, four parts
> are necessary, including the indicator of the naming scheme to be
> used:
>
> The naming scheme: a registered identifier for the protocol.
>
> The name of a suitable server. The format of this part must be
> well defined. It will depend on the lower-layer protocols in
> use. Systems which use widely distributed information, such as
> x.500 and NNTP, do not need this part as each client generally
> contacts his nearest server (or a particular server).
>
> Information to be passed to the server. This may be private to
> the server, as all names may be generated and used by the same
> server. This part of the name should be opaque to the client.
>
> Information to be used by the application once the object has
> been retrieved. This part is private to the application (or,
> more strictly, the data format) and so cannot be defined here.
>
> Both lasting names and physical addresses often share a
> hierarchical structure. This follows often from the organisation of
> the system. From the naming point of view, it has the advantage
> that a reference in one object to another object need not include
> that part of the structure which is common to both names.
>
> CHOICES FOR A UNIVERSAL SYNTAX
>
>
>
> Berners-Lee 7
>
> The requirements above leave little room for choice save for the
> order and punctuation of the elements of an address. It is only
> reasonable for the order of writing of the parts to be consistently
> from left to right (or right to left) with increasing specificity.
> Punctuation schemes fall into two categories (Huitema 1991): tagged
> schemes in which field are given names, and fields which use
> special characters and field order. The latter tend to be more
> compact schemes.
>
>
> protocol: aftp host: xxx.yyy.edu path:
>
> /pub/doc/README
>
> PR=aftp; H=xx.yy.edu; PA=/pub/doc/README;
>
> PR:aftp/xx.yy.edu/pub/doc/README
>
> /aftp/xx.yy.edu/pub/doc/README
>
> Fig 1. Some alternative tagged and untagged representations
>
> The choice of special symbols for punctuation tends to be a matter
> of taste. It is easier to read addresses whose symbols correspond
> to those of one's favourite operating system. A variety of symbols
> is needed so that when a name is abbreviated it is possible to tell
> which parts have been omitted.
>
> The recommendation below uses special characters in order to
> achieve a compact name, and uses where possible punctuation symbols
> established in the internet or unix community.
>
> The choice of escape character for introducing representations of
> non-allowed characters also tends to be a matter of taste. An ANSI
> standard exists in the C language, using the back-slash character
> "\". The use of this character on unix command lines, however, can
> be a problem as it is interpreted by many shell programs, and would
> have itself to be escaped.
>
> There is a conflict between the need to be able to represent many
> characters including spaces within a URL directly, and the need to
> be able to use a URL in environments which have limited character
> sets or in which certain characters are prone to corruption. This
> conflict has been resolved by use of an hexadecimal escaping method
> which may be applied to any characters forbidden in a given
> context. When URLs are moved between contexts, the set of
> characters escaped may be enlarged or reduced unambiguously.
>
> The use of multiple white space characters is discouraged in URLs
> to be printed or sent by electronic mail. This is because of the
> frequent introduction of extraneous white space when lines are
> wrapped by systems such as mail, or sheer necessity of narrow
> column width, and because of the inter-conversion of various forms
>
>
>
> Berners-Lee 8
>
> of white space which occurs during character code conversion and
> the transfer of text between applications.
>
72c481
< URL SYNTAX
---
> FULL FORM
82,90c491,492
< PrePrefix
<
< To be a Uniform Resource Locator as currently defined by the URI
< working group, the whole string must start with a constant prefix
< "URL:". Note that to save space in this document, URLs have been
< quoted throughout without this preprefix.
<
< Scheme
<
---
> SCHEME
>
97,99c499,501
< Those schemes which refer to internet protocols mostly have a
< common syntax for the rest of the object name. This starts with a
< double slash "//" to indicate its presence, and continues until the
---
> Those schemes which refer to internet protocols have a common
> syntax for the rest of the object name. This starts with a double
> slash "//" to indicate its presence, and continues until the
112,116d513
<
<
<
< Berners-Lee 2
<
121c518,522
<
---
>
>
>
> Berners-Lee 9
>
156c557
< the syntax shall not be used unencoded in a URL.
---
> the syntax shall not be used in a URL.
162,167c563,566
< awkward in a given environment. Because a % sign always indicates
< an encoded character, a URL may be made safer simply by encoding
< any characters considered unsafe, while leaving already encoded
< characters still encoded. Similarly, in cases where a larger set
< of characters is acceptable, % signs can be selectively and
< reversibly expanded.
---
> awkward in a given environment. As a % sign always indicates an
> encoded character, a URL may be made safer simply by encoding any
> characters considered unsafe, while leaving already encoded
> characters still encoded.
170,174d568
<
<
<
< Berners-Lee 3
<
176c570
< hexadecimal or base 64 would be more appropriate.)
---
> hex or base 64 would be more appropriate.)
177a572,574
> The same considerations apply to mapping local fragment identifiers
> onto the fragmentid part of a URL.
>
179a577,580
>
>
> Berners-Lee 10
>
182c583
< protocols follow. The schemes covered are
---
> protocols follow.
184,208c585,593
< http Hypertext Transfer Protocol
<
< ftp File Transfer protocol
<
< gopher The Gopher protocol
<
< mailto Electronic mail address
<
< mid Message identifiers for electroni mail
<
< cid Content identifiers for MIME body part
<
< news Usenet news
<
< nntp Usenet news for local NNTP access only
<
< prospero Access using the prospero protocols
<
< telnet , rlogin and tn3270
< Reference to interactive sessions
<
< wais Wide Area Information Servers
<
< The schemes for x.500, network management database and whois++ have
< not been specified and may be the subject of futher study.
---
> HTTP
>
> The HTTP protocol specifies that the path is handled transparently
> by those who handle URLs, except for the servers which de-reference
> them. The path is passed by the client to the server with any
> request, but is not otherwise understood by the client. The
> fragmentid part is not sent with the request. The search part, if
> present, is sent. Spaces in URLs should be escaped for transmission
> in HTTP.
210,214d594
< The url: prefix is reserved for use in encoding a Uniform Resource
< Name when that has been developed by the IETF working group.
<
< New schemes may be registered at a later time.
<
218,223c598,603
< file system of the given host. The FTP protocol is used, as defined
< in RFC957 or any successor. The port number, if present, gives the
< port of the FTP server if not the FTP default. (A client may in
< practice use local file access to retrieve objects which are
< available though more efficient means such as local file open or
< NFS mounting, where this is available and equivalent).
---
> file system of the given host. The FTP protocol is used. The port
> number if given gives the port of the FTP server if not the FTP
> default. (A client may in practice use local file access to
> retrieve objects which are available though more efficient means
> such as local file open or NFS mounting, where this is available
> and equivalent).
225,232c605
< User name and password
<
< The syntax allows for the inclusion of a user name and even a
<
<
<
< Berners-Lee 4
<
---
> The syntax allows for the inclusion of a user name and even a
236,237c609
< is "anonymous" and the password the user's Internet-style mail
< address .
---
> is "anonymous" and the password the user's mail address.
239,242c611,620
< Where possible, this mail address should correspond to a usable
< mail address for the user, and preferably give a DNS host name
< which resolves to the IP address of the client. Note that servers
< currently vary in their treatment of the anonymous password.
---
> The adoption of a unix-style syntax involves the conversion into
> non-unix local forms by either the client or server. Some non-unix
> servers do this, but clients wishing to access sites which do not
> have unix-style naming will need certain algorithms to enable
> other file systems to be identified and treated. Client software
> may also have to be flexible in terms of the sequence of FTP
> commands used with different varieties of server. In view of a
> tendency for file systems to look increasingly similar, it was felt
> that the URL convention should not be weighed down by extra
> mechanisms for identifying these cases.
244,296d621
< Path
<
< The FTP protocol allows for a sequence of CWD commands (change
< working directory) prior to a RETR (retrieve) which actually
< accesses a file. The arguments of any CWD commands are successive
< segment parts of the URL, and the filename argument to the RETR
< command is the final segment of the URL path.
<
< Note
<
< In the case in which the file system of the server is known or
< guessed by the client, the path may possibly converted into a
< filename. This may (in some cases) allow the file to be retrieved
< in one RETR command with no CWD command. In the case of unix, the
< filename will in fact look the same as the URI path. This must NOT
< be taken to indicate that the URL is a unix filename. In
< practice, as many FTP servers in fact have or emulate unix file
< systems, it may in fact be time-efficient to attempt first a direct
< retrieval guessing unix syntax, and, if that fails, to attempt the
< official sequence of succession of directory changes followed by a
< RETR command.
<
< There is no common hierarchical model to the FTP protocol, so if a
< directory change command has been given, it is impossible in
< general to deduce what sequence should be given to navigate to
< another directory for a second retrieval, if the paths are
< different. The only reliable algorithm is to disconnect and
< reestablish the control connection. However, if no directory
< changes have been made, but direct retrieval has been done, then
< the control connection may be kept. Another possible
< uninvestigated method is to use CDUP on the trial assumption of a
< hierarchical structure to return a point in common between the
< first and second URLs.
<
< (This note previously read: "The adoption of a unix-style syntax
< involves the conversion into non-unix local forms by either the
< client or server. Some non-unix servers do this, but clients
< wishing to access sites which do not have unix-style naming will
< need certain algorithms to enable other file systems to be
< identified and treated. Client software may also have to be
< flexible in terms of the sequence of FTP commands used with
< different varieties of server. In view of a tendency for file
<
<
<
< Berners-Lee 5
<
< systems to look increasingly similar, it was felt that the URL
< convention should not be weighed down by extra mechanisms for
< identifying these cases." )
<
< Data type
<
303c628
< but it is outside the scope of this paper.
---
> but it outside the scope of this paper.
305,328c630
< An FTP URL may specify the method by which an object is to be
< retrieved. Two of the modes correspond to the FTP "Data Types"
< ASCII and IMAGE for the retrieval of a document, as specified in
< FTP by the TYPE command. One mode indicates directory access.
<
< The data type is specified by a suffix to the URL separated by an
< unencoded exclamation mark (ASCII 21 hex). Possible suffixes are:
<
< !I Use FTP image (I) mode to perform data
< transfer.
<
< !A Use FTP ASCII (A) mode to perform data
< transfer
<
< !D Use FTP directory list commands to read
< directory
<
< [suggestion: tenex. reference?]
<
< Transfer Mode
<
< Stream Mode is always used.
<
< HTTP
---
> NEWS
330,343c632,633
< The HTTP protocol specifies that the path is handled transparently
< by those who handle URLs, except for the servers which de-reference
< them. The path is passed by the client to the server with any
< request, but is not otherwise understood by the client. The
< fragmentid part is not sent with the request. The search part, if
< present, is sent. Spaces and control characters in URLs must be
< escaped for transmission in HTTP.
<
< GOPHER
<
< Gopher selector strings may contain any characters other than tab,
< return, or linefeed, so it is important to encode all disallowed
< characters and encode any space characters so these characters are
< not altered during transport of the URL. Note that since gopher
---
> The news locators refer to either news group names or article
> message identifiers which must conform to the rules of RFC 850. A
347c637
< Berners-Lee 6
---
> Berners-Lee 11
349,357c639,642
< selector string are opaque and in many cases map to native file
< system of the gopher server, so encoding of disallowed characters
< in the selector string is to map to binary codes rather than ISO
< character sets. In other words, the "%" character followed by two
< hexadecimal digits is used to encode binary data. Clients shall
< not interpret gopher selector strings. While many Gopher servers
< map to Unix file systems, you cannot assume that "/" characters
< imply a heirarchy since Gopher servers on non-Unix file systems may
< use the "/" as part of a file name.
---
> message identifier may be distinguished from a news group name by
> the presence of the commercial at "@" character. These rules imply
> that within an article, a reference to a news group or to another
> article will be a valid URL (in the partial form).
359,361c644,645
<
<
< The format of a gopher URL is:
---
> A news URL may be dereferenced using NNTP or using any other
> protocol for the conveyance of usenet news articles.
363,508c647
< 1. A single-character field to denote the Gopher type of the
< resource to which the URL refers.
<
< 2. The gopher selector string. Note that some gopher selector
< strings begin with a copy of the gopher type character, in which
< case that character will occur twice consecutively. Also note
< that the gopher selector string may be an empty string since
< this is how gopher clients refer to the top-level directory on
< a gopher server.
<
< 3. An encoded tab character (%09) to seperate the gopher
< selector string from the optional search string (see 4 below).
<
< 4. If the URL does not refer to a Gopher+ item and if there is
< no gopher search string then parts 3, 4, 5, and 6 of the URL
< are optional
<
< 4.) The gopher search string. If the URL refers to a search to
< be submitted to a gopher search engine, the search string is
< required. Otherwise this is an empty string.
<
< 5.) A question mark [suggestion: an encoded tab character
< (%09)] to seperate the gopher search string from the optional
< gopher+ string (see 6 below). [suggestion: Note that if the URL
< refers to a gopher+ item and does not have a gopher search
< string, there will be two encoded tab characters in a row.]
<
< 6.) The Gopher+ string. Gopher+ strings consist of a one or more
< characters and are used to represent information required for
< retrieval of the Gopher+ item. Gopher+ items may have alternate
< views, arbitrary sets of attributes, and may have electronic
< forms associated with them. To accomodate the various Gopher+
< objects, the Gopher+ string in the URL must accomodate a
< mapping of the information a Gopher+ client sends to the server.
< This makes this section a bit long since we basically cover the
< entire Gopher+ protocol here.
<
< When a Gopher server returns a directory listing to a client,
< Gopher+ items are tagged with either a "+" (denoting gopher+ items)
<
<
<
< Berners-Lee 7
<
< or a "?" (denoting items which have a +ASK form associated with
< them). A Gopher+ string which is only a "+" refers to the default
< view (data representation) of the item. To retrieve this item a
< gopher+ client should send
<
< a_gopher_selector<tab>+<cr><lf>
<
< to the gopher+ server.
<
< Note that items which have a +ASK asssociated with them (ie.
< Gopher+ items tagged with a "?") require the client to fetch the
< item's +ASK attribute to get the form definition, and then ask the
< user to fill out the form and return the user's responces along
< with the selector string to retrieve the item. Gopher+ clients
< know how to do this but depend on the "?" tag in the gopher+ item
< description to know when to handle this case. The "?" is used in
< the Gopher+ string to be consistent with Gopher+ protocol's use of
< this symbol.
<
< To refer to the Gopher+ attributes of an item, the Gopher+ string
< might consist of "!" or "$". "!" refers to the all of a gopher+
< item's attributes. "$" refers to all the item attributes for all
< items in a Gopher directory. To retrieve an item or directory's
< attributes, a gopher client will send:
<
< a_gopher_selector<tab>!<cr><lf>
<
< for items or
<
< a_gopher_selector<tab>$<cr><lf>
<
< for directories to the gopher+ server.
<
< To refer to specific attributes, the Gopher+ string is
< "!attribute_name" or "$attribute_name". For example, to refer to
< the attribute containing the abstract of an item, the Gopher+
< string would be "!+ABSTRACT". To refer to several attributes,
< clients send the server the attribute names seperated by spaces so
< it is neccesary to seperate the attribute names with coded spaces.
< To retrieve a collection of item attributes specified with a
< gopher+ string of "!+ABSTRACT%20+SMELL" a gopher client would send
<
< a_gopher_selector<tab>!+ABSTRACT +SMELL<cr><lf>
<
< to the gopher server.
<
< Gopher+ allows for optional alternate data representations
< (alternate views) of items. To retrieve a Gopher+ alternate view,
< the gopher+ client sends the appropriate view and language
< identifier (found in the item's +VIEW attribute). To refer to a
< specific Gopher+ alternate view, the URL's Gopher+ string would be
< in the form "+view_name%20language_name". For example, a gopher+
< string of "+application/postscript%20Es_ES" refers to the spanish
<
<
<
< Berners-Lee 8
<
< language postscript alternate view of a gopher+ item. To retrieve
< this alternate view the client would send
<
< a_gopher_selector<tab>+application/postscript Es_ES<cr><lf>
<
< to the gopher server.
<
< The gopher+ string for a URL that refers to an item referenced by
< an ASK form filled out with specific values is essentially a coded
< version of what the client sends to the server. The gopher+ string
< will be of the form
<
< +%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value%0D%0A.%0D%0
< A
<
< To retrieve this item, the gopher client sends:
<
< a_gopher_selector<tab>+<tab>1<cr><lf>
< +-1<cr><lf>
< ask_item1_value<cr><lf>
< ask_item2_value<cr><lf>
< .<cr><lf>
<
< to the gopher server.
<
< For a really complex example, consider a URL that refers to an
< alternate view of an item that is referenced with a filled-out
< Gopher +ASK form. The gopher+ string will be of the form:
<
<
< +view_name%20language_name%091%0D%0A+-1%0D%0Aask_item1_value%0D%0A
< ask_item2_value%0D%0A.%0D%0A
<
< To retrieve this item, the gopher client sends:
<
< a_gopher_selector<tab>+view_name language_name<tab>1<cr><lf>
< +-1<cr><lf>
< ask_item1_value<cr><lf>
< ask_item2_value<cr><lf>
< .<cr><lf>
<
< to the gopher server.
<
< Summary: gopher+ string part of Gopher URL
---
> Note1:
510,621c649
<
<
< To refer to an item which has an ASK form associated with it where
< the intent is to allow the user to enter values into the form as
< part of the retrieval process:
<
< %3F [was: ?]
<
<
<
<
< Berners-Lee 9
<
< To refer to all or specific attributes of a gopher item:
<
< ![attribute_name][%20attribute_name][%20attribute_name]...
<
<
< To refer to all or specific attributes of a gopher directory:
<
< $[attribute_name][%20attribute_name][%20attribute_name]...
<
<
< To refer to the content of a gopher+ item (including an item
< referred to by specific values in a filled-out ASK form):
<
< +[view_name[%20language_name]]
< [%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value...%0D%0A.
< %0D%0A]
<
<
<
< Overall summary and examples
<
<
< The general format of a Gopher URL path refering to a gopher type
< "T" item is:
<
< gopher://host [port]/T[gopher_selector]%09[search_string]?[gopher+_s
< tring]
<
<
< Examples:
<
< An example of a URL pointing to a gopher type 0 item (a document)
< is:
<
< gopher://host [port]/0a_gopher_selector
<
<
< An example of a URL pointing to a gopher type 7 item (a search
< engine) where the string foobar is to be submitted to the search
< engine is:
<
< gopher://host [port]/7a_gopher_selector%09foobar
<
<
< An example of a URL pointing to a Gopher+ type 0 item (a document)
< is:
<
< gopher://host [port]/0a_gopher_selector%09%09some_gplus_stuff
<
<
< An example of a URL pointing to a Gopher+ type 0 (document) item's
< attribute information is:
<
<
<
<
< Berners-Lee 10
<
< gopher://host [port]/0a_gopher_selector%09%09!
<
<
< An example of a URL pointing to a Gopher+ document's spanish
< postscript representation is:
<
< gopher://host [port]/0a_gopher_selector%09%09+application/postscript
< %20Es_ES
<
< .
<
< MAILTO
<
< This allows a URL to specify an RFC822 addr-spec mail address.
< Note that use of % , for example as used in forming a gatewayed
< mail address, requires conversion to %25 in a URL.
<
< This semantics may be considered to be that the object referred to
< by the mailto: URL is the set of messages sent to or from that
< address. There is no algorithm to retrieve this set, but the SMTP
< protocol allows messages to be added to it, and any given user may
< be aware of a subset of its members.
<
< NEWS
<
< The news locators refer to either news group names or article
< message identifiers which must conform to the rules for a
< Message-Idof RFC 1036 (Horton 1987). A message identifier may be
< distinguished from a news group name by the presence of the
< commercial at "@" character. These rules imply that within an
< article, a reference to a news group or to another article will be
< a valid URL (in the partial form).
<
< A news URL may be dereferenced using NNTP (RFC977, Kantor 86) (The
< ARTICLE by message-id command ) or using any other protocol for the
< conveyance of usenet news articles, or by reference to a body of
< news articles already received.
<
< Note1:
<
< Among URLs the "news" URLs are anomalous in that they are
---
> Among URLs the news: URLs are anomalous in that they are
629,630c657,658
< Note 2:
<
---
> Note 2:
>
634,638d661
<
<
<
< Berners-Lee 11
<
641,643c664,666
< Suggested subject of study in conjunction with NNTP working group.
< Further extension possible may be to allow the naming of subject
< threads as addressable objects.
---
> Suggested subject of study in conjunction with NNTP WG. Further
> extension possible may be to allow the naming of subject threads as
> addressable objects.
645,646c668,669
< NNTP
<
---
> NNTP
>
650,651c673
< message identifier. In all other cases the "news" scheme should be
< used.
---
> message identifier.
655d676
< The NNTP protocol must be used.
657,661c678,684
< Note1.
<
< This form of URL is not of global accessability, as typically NNTP
< servers only allow access from local clients. Note that the
< article numbers within groups vary from server to server.
---
> Note1.
>
> This form of URL is not of global accessiablity, as typically NNTP
> servers only allow access from local clients. This form or URL
> should not be quoted outside this local area. It should not be
> used within news articles for wider circulation than the one
> server.
663,668c686,699
< This form or URL should not be quoted outside this local area. It
< should not be used within news articles for wider circulation than
< the one server. This is a local identifier for a resource which is
< often available globally, and so is not recommended except in the
< case in which incomplete NNTP implementations on the local server
< force its adoption.
---
> WAIS
>
> The current WAIS implementation public domain requires that a
> client know the "type" of a object prior to retrieval. This value
> is returned along with the internal object identifier in the search
> response. It has been encoded into the path part of the URL in
>
>
>
> Berners-Lee 12
>
> order to make the URL sufficient for the retrieval of the object.
> Within the WAIS world, names do not of course not need to be
> prefixed by "wais:" (by the partial form rules).
679c710
< version number. If present, the version number is separated from
---
> version number. If present, the version number is seperated from
681c712
< zero zero), this being an escaped string terminator (null).
---
> zero zero), this being an escaped string terminator (null).
683c714
< access method and are not represented as Prospero URLs.
---
> access method and are not represented as Prospero URLs.
684a716,740
> GOPHER
>
> The first character of the URL path part (after the initial single
> slash) is a single-character "type" field which is that used by the
> Gopher protocol. The rest of the path is the "selector string",
> with disallowed characters encoded. Note that some selector strings
> begin with a copy of the gopher type character, in which case that
> character will occur twice consecutively in the URL. If the type
> character and selector are omitted, the type defaults to "1".
> Gopher links which refer to non-Gopher protocols are represented
> directly as URLs of the underlying access method and are not
> represented as Gopher URLs.
>
> MAILTO
>
> This allows a URL to specify an RFC822 addr-spec mail address.
> Note that use of % , for example as used in forming a gatewayed
> mail address, requires conversion to %25 in a URL.
>
> This semantics may be considered to be that the object referred to
> by the mailto: URL is the set of messages sent to or from that
> address. There is no algorithm to retrieve this set, but the SMTP
> protocol allows messages to be added to it, and any given user may
> be aware of a subset of its members.
>
691a748,749
> this is a less desirable, though currently common, solution.
>
695c753
< Berners-Lee 12
---
> Berners-Lee 13
697c755,762
< this is a less desirable, though currently common, solution.
---
> X500
>
> The mapping of x500 names onto URLs is not defined here. A decision
> is required as to whether "distinguished names" or "user friendly
> names" (ufn), or both, should be allowed. If any punctuation
> conversions are needed from the adopted x500 representation (such
> as the use of slashes between parts of a ufn) they must be defined.
> This is a subject for study.
699c764
< WAIS
---
> WHOIS
701,707c766,770
< The current WAIS implementation public domain requires that a
< client know the "type" of a object prior to retrieval. This value
< is returned along with the internal object identifier in the search
< response. It has been encoded into the path part of the URL in
< order to make the URL sufficient for the retrieval of the object.
< Within the WAIS world, names do not of course need to be prefixed
< by "wais:" (by the partial form rules).
---
> This prefix describes the access using the "whois++" scheme in the
> process of definition. The host name part is the same as for other
> IP based schemes. The path part can be either a whois handle for a
> whois object, or it can be a valid whois query string. This is a
> subject for further study.
708a772,775
> NETWORK MANAGEMENT DATABASE
>
> This is a subject for study.
>
712,715c779,785
< conforming URL syntax, using a new prefix. Experimental prefixes
< may be used by mutual agreement between parties, and must start
< with the characters "x-". The scheme name "urn:" is reserved for
< the work in progress on a scheme for more persistent names.
---
> conforming URL syntax, using a new scheme identifier. Experimental
> scheme identifiers may be used by mutual agreement between parties,
> and must start with the characters "x-". The scheme name "urn:" is
> reserved for the work in progress on a scheme for more persistent
> names. Therefore URNs (Names) and URLs (Locators) be
> distinguishable. An object which is either a URL or a URN is known
> as a URI (Identifier).
731c801
< retrieval by URL, that the client software have provision for being
---
> retrieval by URI, that the client software have provision for being
735c805
< BNF for specific URL schemes
---
> BNF syntax
737a808,812
>
>
>
> Berners-Lee 14
>
739,742c814,817
< [brackets] indicate optional parts. Spaces are represented by the
< word "space", and the vertical line character by "vline". Single
< letters stand for single letters. All words of more than one letter
< below are entities described somewhere in this description.
---
> [brackets] indicate optional parts. Spaces are representated by
> the word "space", and the vertical line character by "vline".
> Single letters stand for single letters. All words of more than one
> letter below are entities described somewhere in this description.
744,745c819,820
< The current IETF URI working group preference is for the
< prefixedurl production. (Nov 1993. July 93: url).
---
> The current IETF URI working group prefereence is for the
> prefiexedurl production. (Nov 1993. July 93: url).
749,754c824
< characters do not appear in any productions and therefore may not
<
<
<
< Berners-Lee 13
<
---
> characters fo not appear in any productions and therefore may not
769c839
< | mailtoaddress | midaddress | cidaddress
---
> | mailtoaddress
778c848
< ftpaddress f t p : / / login / path [ ! ftptype ]
---
> ftpaddress f t p : / / login / path
786,789d855
< midaddress m i d : addr-spec
<
< cidaddress c i d : content-identifier
<
799a866,870
>
>
>
> Berners-Lee 15
>
808,812d878
<
<
<
< Berners-Lee 14
<
839,840d904
< ftptype A | I | D
<
851c915
< path void | segment [ / path ]
---
> path void | xpalphas [ / path ]
853,854d916
< segment xpalphas
<
862,865d923
<
< gtype xalpha
<
< xalpha alpha | digit | safe | extra | escape
869c927
< Berners-Lee 15
---
> Berners-Lee 16
870a929,932
> gtype xalpha
>
> xalpha alpha | digit | safe | extra | escape
>
885c947
< digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
---
> 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
889c951
< extra " | ' | ( | ) | : | ; | , | space
---
> extra ! | * | " | ' | ( | ) | : | ; | , | space
891,892d952
< reserved ! | *
<
910,911d969
< (end of URL BNF)
<
920,923c978,980
< A URL-related security threat is that it is sometimes possible to
< construct a URL such that an attempt to perform a harmless
< idempotent operation such as the retrieval of the object will in
< fact cause a possibly damaging remote operation to occur. The
---
> The use of URLs containing passwords is clearly unwise.
>
> Conclusion
927c984,985
< Berners-Lee 16
---
>
> Berners-Lee 17
929,938c987,994
< unsafe URL is typically constructed by specifying a port number
< other than that reserved for the network protocol in question. The
< client unwittingly contacts a server which is in fact running a
< different protocol. The content of the URL contains instructions
< which when interpreted according to this other protocol cause an
< unexpected ooperation. An example has been the use of gopher URLs
< to cause a rude message to be sent via a SMTP server. Caution
< should be used when using any URL which specifies a port number
< other than the default for the protocol, especially when it is a
< number within the reserved space.
---
> A need has been demonstrated, and a number of requirements have
> been stated for uniform resource locators (URLs). A scheme has been
> proposed which builds on existing conventions to define a syntax
> for URLs. This scheme has been in serious use by World-Wide Web
> (W3) initiative since 1991. Adoption of the scheme in
> correspondence, standards and software will ease the use of
> references to on-line information in a flexible way as the coming
> information age arrives.
940,948d995
< Care should be taken when URLs contain embedded encoded delimiters
< for a given protocol (for example, CR and LF characters for telnet
< protocols) that these are not unencoded before transmission. This
< would violate the protocol but could be used to simulate an extra
< operation or parameter, again causing an unexpected and possible
< harmful remote operation to be performed.
<
< The use of URLs containing passwords is clearly unwise.
<
968c1015
< Amsterdam IETF and refined in net discussion.
---
> Amsterdam IETF and refined in net discussion.
970,972d1016
< The draft 03 includes changes made at Houston in Nov 93, and on the
< net before Seattle March 1994.
<
977c1021
< Wrappers for URIs in plain text
---
> Fragment-id
979c1023,1027
< This section does not formally form part of the URL specification .
---
> This represents a part of, fragment of, or a sub-function within,
> an object or object. Its syntax and semantics are defined by the
> application responsible for the object, or the specification of the
> content type of the object. The only definition here is of the
> allowed characters by which it may be represented in a URL.
981c1029,1039
< URIs, including URLs, will ideally be transmitted though protocols
---
> The fragment-id follows the URL of the whole object from which it
> is separated by a hash sign (#). If the fragment-id is void, the
> hash sign may be omitted: A void fragment-id with or without the
> hash sign means that the URL refers to the whole object.
>
> While this hook is allowed for identification of fragments, the
> question of addressing of parts of objects, or of the grouping of
> objects and relationship between contined and containing objects,
> is not addressed by this object.
>
> This object does not address the question of objects which are
985c1043
< Berners-Lee 17
---
> Berners-Lee 18
986a1045,1111
> different versions of a "living" object, nor of expressing the
> relationships between different versions and the living object.
>
> Partial form
>
> In a certain limited set of cases, generally within a certain
> application, it may be useful to pass only a section of the URL.
> Within a object whose URL is well defined, the URL of another
> object may be given in abbreviated form, where parts of the two
> URLs are the same. This allows objects within a group to refer to
> each other without requiring the space for a complete reference,
> and it incidentally allows the group of objects to be moved
> without changing any references. This is not discussed in detail
> here, it is only mentioned so that the characters required by the
> technique be reserved for that purpose. It must be emphasised that
> when a reference is passed in anything other than a well controlled
> context, the full form must always be used.
>
> The partial form relies on a property of the URL syntax that
> certain characters ("/") and certain path elements ("..", ".") have
> a significance reserved for representing a hierarchical space, and
> must be recognised as such by both clients and servers.
>
> A partial form can be distinguished from a full form in that a full
> form must have a colon and that colon must occur before any slash
> characters.
>
> The rules for the use of a partial name are:
>
> If the scheme parts are different, the whole absolute locator
> must be given. Otherwise, the scheme is omitted, and:
>
> If the host and/or port parts are the different, the host, port
> name and all the rest of the locator must be given.
>
> If the access and host parts are the same, then the path may be
> given in absolute (fully qualified) or relative form. Within the
> path:
>
> If a leading slash is present, the path is absolute. Otherwise,
> a relative path is interpreted as follows:
>
> The last part of the path of the context locator (anything
> following the rightmost slash) is removed, and the given partial
> URL appended in its place.
>
> Within the result, all occurrences of "xxx/../" or "/." are
> recursively removed, where xxx, ".." and "." are complete path
> elements.
>
> Note: If a path of the context locator end in slash, partial URLs
> will be treated differently to their treatment with respect to the
> same path without a slash. Using a trailing slash on a directory
>
>
>
> Berners-Lee 19
>
> name is not therefore recommended. The signifcance of a trailing
> slash may be considered as that of the locator of a file with void
> name within that directory.
>
> Wrappers for URIs in plain text
>
> This section does not formally form part of the URL specification.
>
> URIs, including URLs, will ideally be transmitted though protocols
1005,1006c1130,1133
< Yes, Jim, I found it under <ftp://info.cern.ch/pub/www/doc> but
< you can probably pick it up from <ftp://ds.internic.net/rfc>.
---
> Yes, Jim, I found it under <ftp://info.cern.ch/pub> bu
> t
> you can probably pick it up from <ftp://ds.internic.ne
> t/rfc>.
1009d1135
<
1022,1024c1148,1150
< December 1991, as updated from time to time,
< <ftp://info.cern.ch/pub/www/doc/http-spec.txt
< >
---
> December 1991,
> <ftp://info.cer
> n.ch/pub/www/doc/http-spec.txt>
1029a1156,1160
>
>
>
> Berners-Lee 20
>
1040,1047d1170
<
<
<
< Berners-Lee 18
<
< Horton (1987) M. Horton, R. Adams, "Standard for
< interchange of USENET messages", Internet RFC
< 1036 , 12/01/1987.
1062c1185
< transmission of news" , Internet RFC-977,
---
> transmission of news", Internet RFC-977,
1066,1068d1188
< Kunze, 1994 J. Kunze, Requirements for URLs, to be
< published.
<
1092,1094d1211
< Sollins 1994 K. Sollins and L. Masinter, Requiremnets for
< URNs, to be published.
<
1097d1213
< Performance Systems International, Inc.
1101c1217
< Berners-Lee 19
---
> Berners-Lee 21
1102a1219
> Performance Systems International, Inc.
1109,1112c1226,1228
< .
<
< AUTHOR'S ADDRESS
<
---
> Author's address
>
>
1122a1239
>
1126d1242
<
1160c1276
< Berners-Lee 20
---
> Berners-Lee 22