Appendix B: Relative URLs

Roy T. Fielding (fielding@simplon.ICS.UCI.EDU)
Wed, 27 Jul 1994 23:41:56 -0700

To: uri@bunyip.com
Subject: Appendix B: Relative URLs
Date: Wed, 27 Jul 1994 23:41:56 -0700
From: "Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>
Message-Id: <9407272342.aa24614@paris.ics.uci.edu>

I am resubmitting this because a control error on the mailing list
delayed my last message by 8 hours and thus did not get seen by others
until after the 5.2 update. I would like to have this considered at
the Thursday IETF meeting, so I am hoping that someone reads their mail
in the morning and takes it with them.

What follows is my proposed addition to the URL interim draft 5.2
in order to properly specify relative URLs and the procedure for
their consistent parsing. In particular, this addition is necessary
to fulfill the explicit requirements of Internet Resource Locators
regarding parsability (Section 3.2 of IRL Reqs.). Furthermore,
although usability is not an explicit requirement in the IRL document,
it is well known within the WWW community that relative URLs are
required for the usability of URLs as embedded references. It would
be foolish for this group to claim that usability is not an important
aspect of Internet Resource Locators.

Please note that the parsing of relative URLs described in
appendix B does not exactly match current practice for the WWW.
It describes the _correct_ method of parsing relative URLs such
that the result is consistently correct and/or safe even when the
base URL has a null url-path. I doubt that the WWW community would
hesitate in adopting these rules once a standard exists.

....Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)
<A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>

================================================================

APPENDIX B: Relative URLs

The body of this document defines the syntax and semantics of URLs
in their absolute form. This appendix defines how relative URLs
(sometimes referred to as partial URLs) can be used within certain
object content types and retrieval contexts in place of some
absolute URLs and, further, how those relative URLs should be
interpreted relative to the object's absolute base URL to obtain
their absolute form.

Within an object whose URL is well defined, the URL of another
object may be given in abbreviated form, where parts of the two
URLs are the same. This allows objects within a group to refer to
each other without requiring the space for a complete reference.
It also allows a group of objects to be moved from one location to
another without changing any of the embedded, relative URLs. In
this way, URLs can be stored in objects without the need for
dramatic change if the higher-order parts of a hierarchical naming
system are modified. Apart from terseness, this gives greater
robustness to practical systems by enabling information hiding
between system components.

To be "well defined," an object's absolute base URL must either be
known from the context of its retrieval (i.e. the absolute URL used
to retrieve the object) or be embedded within the object itself for
the explicit purpose of defining the base URL. The latter case is
dependent upon the content-type of the object and the rules for
obtaining the base URL in each content-type are beyond the scope of
this specification. However, it must be emphasized that relative
URLs cannot be used reliably in situations where the object's base
URL is not well defined. Furthermore, it is recommended that
relative URLs never be used within plain text objects.

The relative form relies on a property of the URL syntax that
certain characters ("/") and certain path segments ("..", ".") have
a significance reserved for representing a hierarchical space.
The URL schemes which do not follow this property (e.g. gopher,
telnet, and wais) or which do not use the common Internet scheme
syntax described in Section 3.1 (e.g. mailto and news) cannot be
used in relative form.

The rules for the interpretation of an embedded URL relative to the
base URL of its enclosing object are:

1. If the base URL is unknown, the embedded URL is interpreted
as an absolute URL.

2. If the scheme of the embedded URL is different from that of the
base or is one of the schemes that do not use relative forms,
the embedded URL is interpreted as an absolute URL.

3. If the scheme of the embedded URL is omitted or is the same as
that of the base, the embedded URL is interpreted as using that
scheme and the remainder of the embedded URL (after any "scheme:")
is interpreted as follows:

a) If the remainder of the embedded URL is empty, then the
embedded URL is equivalent to the base URL.

b) If the remainder of the embedded URL starts with a double
slash ("//"), the remainder is interpreted as an absolute URL.

c) If the remainder of the embedded URL starts with a single
slash ("/"), the initial scheme-specific part of the base URL
(starting with the double slash "//" and continuing until the
the following slash "/", if any), is prepended to the
remainder and the result is interpreted as an absolute URL.

d) Otherwise, the embedded URL is interpreted as having the
same initial scheme-specific part as that of the base URL,
and the url-path (as described in Section 3.1) is interpeted
as follows:

1) If the base URL has no url-path, a single slash "/" is
prepended to the remainder of the embedded URL and the
result is the absolute url-path.

2) Otherwise, the last path segment of the base url-path
(anything following the rightmost slash "/") is removed
and the remainder of the embedded URL is appended in its
place. Within the result, all occurrences of "/." or
"xxx/../", where "xxx", ".." and "." are complete path
segments, are recursively removed. Removal of all "/."
path segments is done prior to removal of all "xxx/../".
Removal of "xxx/../" path segments is performed iteratively,
removing the leftmost matching pattern on each iteration.

Examples:

Within an object with a well defined base URL of

<URL:magic://a/b/c/d>

the relative URLs would expand as follows:

g:h = <URL:g:h>
magic:g = <URL:magic://a/b/c/g>
magic: = <URL:magic://a/b/c/d>
g = <URL:magic://a/b/c/g>
./g = <URL:magic://a/b/c/g>
/g = <URL:magic://a/g>
//g = <URL:magic://g>
../g = <URL:magic://a/b/g>
../../g = <URL:magic://a/g>
../../../g = <URL:magic://a/../g>
./../g = <URL:magic://a/b/g>
/./g = <URL:magic://a/./g>
g/./h = <URL:magic://a/g/h>
g/../h = <URL:magic://a/h>

Note that, although the last five examples are not likely
to occur within any object, all URL parsers should be
capable of resolving them consistently.