Seattle minutes

Alan Emtage (bajan@bunyip.com)
Sat, 2 Apr 1994 02:23:22 -0500

Message-Id: <9404020723.AA17671@mocha.bunyip.com>
From: bajan@bunyip.com (Alan Emtage)
Date: Sat, 2 Apr 1994 02:23:22 -0500
To: uri@bunyip.com, minutes@cnri.reston.va.us
Subject: Seattle minutes

Minutes of the Uniform Uniform Identifiers Working Group.

Chair: Jim Fullton/CNIDR (fullton@cnidr.org)
Alan Emtage/Bunyip (bajan@bunyip.com)

Minutes taken by Craig Summerhill (craig@cni.org), edited by Alan Emtage.

I. Minutes from Houston approved

II. Changes to agenda proposed by Chair

Original Agenda (mailed to uri list):

1) Intro
2) Approval of Houston minutes
3) Approval of agenda
4) URI Overview
5) URLs
a) Review of current URL functional requirements draft
b) Review of current URL functional specifications draft
c) Presentation, Mike Schwartz (15 mins)..."URLs and Internet White
Pages"
d) Final URL discussion and closure
6) URNs
a) Review of current URN functional requirements draft
b) Discussion on URNs
7) URCs

Proposal:

1) Overview by Karen Sollins (sollins@lcs.mit.edu) and Larry
Masinter (masinter@parc.xerox.com)
2) Uniform Resource Names (URNs)
3) Uniform Resource Characteristics (URCs)
4) Uniform Resource Locators (URLs)

Simon Spero (ses@unc.edu) proposed that the agenda remain the same as
mailed. Consensus of the working group was to follow the Chair's
proposed agenda.

[Note: there is a debate, yet unresolved, as to the meaning of the 'C'
in URCs. Karen Sollins, Michael Mealling and others have proposed
that this mean "characteristics" while others propose "citations".
Sollins et al, suggest that "citation" has considerable semantic
meaning to librarians and recomment "characteristics instead. The
group has yet to resolve the issue -- ed]

III. Overview

Karen Sollins made a short presentation.

An overview document needs to be written, and that work should be
authored under the auspices of the IIIR working group. Would like
to have a first cut and then run it by this working group for
review. Approved by the working group.

The Chair proposed a brief enumeration of possible services and
actions of UR*.

Larry Masinter said general enumeration is already taking place in the
IIIR architecture document, and didn't think that should happen here in
this meeting however Mitra (mitra@path.net) thought it would be useful
to go over the definitions that we developed at the Houston meeting.
They were at the time fairly brief, but generally agreed upon.

Emtage enumerated three potential services [ --> is a process, mapping
or application in this -- ed]

1) General lookup
URN --> URLs
URCs

2) Reverse lookup
URL --> URNs
URCs

3) Characteristics/attribute matching
URC --> URNs
URLs

Dave Crocker (dcrocker@mordor.stanford.edu) asked if one considers a
word processing document and the Postscript rendering of that document
to be the same thing? Is that part of this overview?

Sollins and the Chair responded that this had been discussed in
several previous discussions and would be addressed in the URN
requirements later.

V. Karen Sollins and Larry Masinter presented the current Internet Draft
on Functional Requirements for URNs.

Functional requirements of URNs in the draft are:

o global scope (applicable globally)
o global uniqueness (unique over the network)
o persistence (should have lasting value - not transient)
o scalability (should be able to scale)
o grandfathering (must grandfather existing systems)
o extensibility (future extensions must be possible)
o independence (sole responsibility of naming authority)
o resolution (will not impede resolution to a URL)

Requirements for encoding:

o single encoding (visible string)
o human transcribability (humans should be able to read/transcribe)
o simple comparison (two URNs should be comparable)
o transport friendliness (can be transported within standard
Internet protocols such as SMTP, FTP etc)
o machine consumption (machines can parse the data)

Several people questioned the removal of the the issue of URN "sameness"
from the current draft when it was in previous drafts. Sollins and
Masinter responded:

o This is an issue, but it is kind of fuzzy at this point. So it was
dropped from the current Draft.

o The naming authority gets to decide whether two items are the same
same or not. Within their domain, they can assign the same URN to
two different data formats with the same intellectual content, if
they choose to do so [answering Crocker's question -- ed]

Other requirements:

o name assignment is delegated to naming authorities (which are
registered)
o naming authorities are encouraged to provide scalable naming, but
it is not required
o naming authorities should guarantee mapping to URLs, but doesn't
need to provide this service themselves
o URNs must be built with a limited character set in order to be
transportable
o naming authority must abide by these constraints

Keith Moore (moore@cs.utk.edu) noted that there was an issue of
caching that has been dropped from the document, but there are going
to be times in electronic publishing when you need to know that a URN
applies to data format "a" but not data format "b".

He noted that the concept of Location Independent File Names (LIFNs)
needs to be addressed. [ This was raised on the URI list previously
--ed]. The issue of "immutability" needs to be introduced either to
URNs or some other construct.

Sollins and Masinter replied that caching did get dropped because it
was felt that it was outside the scope of URNs. However, they noted
that they don't in any way feel that this list is exhaustive. Other
things can be added in the future when the need arises. At this point,
the document is an informational document, not a standards track
document. Also the ability for the URN to guarantee distinguishing
between mutable and immutable documents they believe is beyond the
scope of URN unless we have total control over all network space.

Chris Weider (clw@bunyip.com) noted that transcribability and limiting
of character sets may discourage people from using them. Could a document
in Japanese or Chinese have the URN embedded within the document, for
example?

Sollins and Masinter replied that they were describing
transcribability not the ability to generate a URN. Plus, the URN
doesn't have to be in the same character set as the document itself.

Simon Spero proposed and amendment to section on simple comparison --
would it not be better to say that "it is the goal to make the
algorithm for comparing URNs as simple as possible" ?

Sollins/Masinter noted that this paragraph may have been munged during
the editing process. Spero was tasked to provide alternate wording.

Mitra said that this process has to be simple. It has to be local. We
wanted to avoid it getting munged by mailers. Should be case
insensitive, and should ignore white space and CRLFs.

John Curran (jcurran@nic.near.net) noted that "optional punctuation"
is a scary concept. We may have to separate a URN from the environment
it lives in. He wants the difference between encoding and presentation
to be addressed.

Sollins/Masinter said without a specific proposal to judge this, they
didn't want to limit it. The algorithm should be simple, but it should
also be immune to the kinds of transcription errors that people make.
Again, without a specific proposed standard it is difficult to make
judgement on this issue.

Keith Moore again raised the issue of LIFNs and remarked that this
document doesn't come close to that. A need for a name such that a
series of octets is immutable. These location independent files names
need to be distinguishable from URNs too.

Sollins and Masinter suggested that the group start with this
document, and if there is a proposal for this set of constraints plus
other added functionality, bring it back to the group to be considered.
They believed that the group wanted something that would describe the
core set of requirements for a URN which may or may not be a LIFN.

The Chair noted that this document doesn't prevent the creation of
immutable URNs either, it simply doesn't require it.

The question of a singled "canonical encoding" of a URN was raised by
Mitra.

Larry Masinter noted that if you have a simple comparison function,
you also have the ability for a canonical representation and so it was
left it out of this document. Mitra responded that the definition of
sameness would need to be addressed more specifically and John Curran
noted that having a canonical representation is going to be important
when we begin to develop implementations of URNs.

The issue of canonical representation was unresolved.

There was a proposal that "In free text representations, URNs should
be recognizable as such.". The group agreed.

Keith Moore requested that the definition of URC in the Internet Draft
be made clearer.

Erik Huizer (Erik.Huizer@surfnet.nl) [ Area Director with primary
responsibility for this group --ed] suggested a short time limit in
revising this document and having it go through to the RFC editor.
John Curran added that it should be put out on the net as a document
which represents the "rough" consensus of the group, and that we're
looking for wordsmithing and that the concepts stand. Keith Moore
disagreed. However the group noted that since this was and
informational RFC it can be re-issued easily as other requirements
become clearer. Erik Huizer suggested that in any case the document be
sent to the IESG for a "last call" review [though not required -- ed]
The group agreed.

Chris Weider and Peter Deutsch (peterd@bunyip.com) had previously
proposed a Functional Specification document for URNs and it is an
Internet Draft. He was asked by the Chair to present possible elements
of a URN to the group in light of the Requirements document draft.

In his proposed scheme a URN has four elements:
1) wrapper or a tag
2) URN, easily distinguished in text
3) (hierarchical) naming authority identification
4) opaque string

John Curran noted that the Masinter/Sollins document specifies
'centrally registered' authorities and that you can't have both
centrally registered and hierarchical. Mitra suggested that the Domain
Name System (DNS) is in fact both, depending on which part of the
authority is examined. Tim Berners-Lee (timbl@info.cern.ch) [ from the
net --ed] proposed two components: hyperarticle (or opaque string) and
an authority that imagines it is going to resolve the the opaque
string into locations. He would prefer something that is hierarchical
like DNS. He suggested that we make the boundary between the naming
authority and the opaque string flexible so that the boundary becomes
invisible.

Chris Weider said that the current [ unrevised --ed] specification has
3 parts:

o scheme id: (e.g. IANA)
o authority
o opaque string

with a visible boundary between elements.

Mitra had some concerns and presented an example where the naming
authority and a hierarchy would not have a boundary with the
opaque string. UK here can be viewed as part of the naming authority
or part of the "opaque" string. Use of DNS to resolve the URN (locate
a resolution service) being part of the scheme.

e.g. IANA:Collins.UK.12345.n

o In this scheme, on day one, IANA doesn't become the sole naming
authority. Initially Collins is the authority, then
IANA:Collins.UK becomes the naming authority as things start to
move around. When the resolution changes, you can't remove the
"Collins", but you can move the place where resolution occurs.

o Simon Spero said that this is already very similar to ASCII solo
representation of distinguished names (RFC1485).

Some members disagreed with Mitra's proposal:

o Michael Mealling (ccoprmm@gatech.edu) was concerned that DNS might
die if TCP/IP goes away and ATM takes over. He wants URNs to work
after DNS goes away, and doesn't want references to DNS in the
specification.

o Clifford Lynch (calur@uccmvsa.ucop.edu) said that the idea of a
sliding divider in URNs is a bad idea which comes from one
representation of how URNs are going to work. In this example, if
you start with Collins as the naming authority and later decide to
subdivide Collins into multiple smaller naming authorities you're
going to have problems.

Mitra responded with the example:

e.g. IANA:COLLINS_UK:

John Curran disagreed saying that somebody's input form is going to
break because the application is processing for a canonical name (even
if it isn't in the specification).

Further discussion was directed to the mailing list.

VI. The Chair proposed that the current Working Group Charter be reviewed
in the light of the current work. The group agreed. The Chair has
been tasked to propose a revised Charter.

The group decided to start the discussion of URLs rather than URCs.

VII. Functional Requirements for Internet Resource Locators

John Kunze (jak@violet.berkeley.edu) was asked to present his draft
Functional Requirements document on URLs. These requirements were
generally agreed to in Houston.

Before presenting the document he made the following points:

o The document was sent it to the list, and received no comments.

o He has purposely avoided the use of any reference to 'uniform
resource locator' in this document since it was supposed to
"exist" before the functional specifications document

o There are a few definitions such as uniqueness, which the author felt
were necessary

The core requirements for URLs:

- transient (they will change)
- global scope
- parsable (machine consumable)
- distinguishable
- transport friendly
- transcribable
- will include service parameters
- extensible
- no information other than that required to access object

Craig Summerhill proposed that wording be brought into line with the
other Functional Requirements documents. General consensus was yes,
where possible.

The general consensus was that the document should go forward, after
any synchronization of wording with the Masinter/Sollins documents that
might be needed. Kunze will send it to the list in the same time frame
as the Masinter/Sollins document.

VIII. URL Functional Specifications draft

Group discussed current draft of URL Functional Specifications
document by Tim Berners-Lee.

As a starting point in the discussion the Chair noted from the mailing
list, these areas of contention:

o Gopher+ protocol '?' type
o CWD slashes in FTP (directory problem)
o file types in FTP

1) Mark McCahill (mpm@boombox.micro.umn.edu) presented his suggested
syntax for the Gopher+ URL:

Gopher URL issue is that we need to be able to refer to Gopher+ within
the URL in order to parse queries that are being done with '?' in the
Web/HTTP implementation. He said that we could code the question mark
as %4F or %09 (whatever the appropriate encoding is) in the protocol,
but would like it to look prettier for obvious reasons.

o Tim Berners-Lee said that the the real question is whether this is
a function of the Gopher Plus protocol or is it part of the URL
syntax? If it is part of the URL syntax, then it should be a
question mark.
o John Kunze had a problem with the question mark having global
meaning, because we have a requirement that the string to the
right of the service identifier is opaque. He also has a concern
about using the '?' to get typing information, and thinks that
is something better left for the URC.
o John Curran said that anything that falls to the right of the
service identifier is opaque, but didn't see anything wrong
with having a mapping that would make the opaque string easier to
work with.
o Tim Berners-Lee said When you're using a '?' then it is not
opaque. In the WWW, he has generally gone along with this
because that is what people wanted in this group

General consensus was to go with Mark McCahill's proposal.

Keith Moore raised the concern of having Gopher URLs being able to
spoof other protocols and wanted wording in the document to alert
implementors of this problem.

o Mark McCahill noted this is not a problem specific to the Gopher
URLs since other URLs can do this.

o John Curran didn't want to see anything that addresses a security
concern in the specification for URLs themselves. He suggested
that the group ennumerate these security concerns in an
additional document, but didn't want to see constraints placed on
the specification in the RFC.

o Reacting to Moore's comment about disallowing URLs to point at
port other than that assigned for the particular access method
Mitra noted that there are perfectly legitimate reasons to point a
Gopher server at other ports.

o Keith Moore maintained that there should be a specific passage
within the Gopher section of the document pertaining to security.
Tim Berners-Lee noted that in fact, there is at the back of the
document, but Moore wanted it in the section where the syntax for
the Gopher URL is presented. He was tasked by the group to
produce appropriate wording for the document within two weeks of
the meeting.

(2) The Chair presented the issue that in the FTP URL, the group long
ago decided that although the forward slashes may look like a UNIX
path, they are really characters that delimit the components in the
directories statement and the terminal object is the file (object)
itself. So, there is an issue:

What does ftp://host/a/b/c means ?

a) CWD a; CWD b; RETR c

OR

b) CWD a/b; RETR c

Larry Masinter was chief proponent of the scheme (a), Keith Moore for
scheme (b).

The following points were made:

o In an Andrew File System, you can't issue CWD a and CWD b, you
must issue the CWD a/b command.

o The slash is a delimiter, it is not part of the path.

o What if some of the directories are nested with security,
and I don't have permission to read it (like directory "b" in
this example)?

o Does anybody have a case where they need to do multiple
CWD commands?

o If we use scheme (a), we have the option of issuing multiple CWD
commands. The other option gives us only the choices of 0 or 1
CWD command.

o Does anybody know of an existing practice? What are existing
applications doing with these things now?

There was rough consensus that the specification should mandate
multiple CWDs. The client should do multiple CWDs, and if one CWD can
be issued it needs to be encoded or quoted. For example, slashes would
not be used "in the clear" where they are delimiters, but hex encoded
with a slash used for delimiting the directory structure from the
retrieved object. Wordsmithing will be done by Larry Masinter and sent
to the list for approval.

Huizer was asked by a number of group members if he would be part of
the review process. He declined by saying that as part of the IESG he
will be asked to review it there. Tim Berners-Lee was asked to post
the current draft as an Internet Draft and the revised draft, when
finished as well. Berners-Lee has agreed to this.

Harri Salminen (Harri.Salminen@funet.fi) [ from the net --ed] asked
about the proposed URLs for news.

He proposed non-lower case names in Usenet newsgroups. Some of these
newsgroup servers can have spaces and wildcards and all kinds of
characters in these names.

o Larry Masinter said that the URL is supposed to refer to the
object, and be unambiguous. So, we can eave out the wildcard. If
you use a wildcard, you aren't referring to a single object.

o John Curran said that the fact that the character set is a
superset, and people can put these characters in a URL doesn't
mean people *will* use them. We should just leave this alone.

Tim Berners-Lee questioned if the specification for NNTP be in the
document since the function requirements state that it needs to be
globally unique. The group decided:

o The URL is globally unique, but it doesn't have to be globally
accessible.

o The URL doesn't guarantee that it will get you there. We can't
guarantee that it be will be globally accessible.

(3) Transfer Type. We need to be able to encode the types of access
required in order for transfer to occur. There are currently four
types: [IMAGE, ASCII and LOCAL in RFC959 --ed] and directory [not in
RFC959 --ed]. The issue here is one of syntax.

Larry Masinter proposed that the directory "type" (since it is not part
of the RFC) be specified as a trailing slash. This had the consensus of
the group.

o Proposal is to deal with the others as "!Type=A" or ";Type=I"

e.g. ftp://host/path/document!type=i

The following points were made in the discussion:

o There is general consensus this approach is OK.
o Should the type information be mandatory in an FTP URL? Should
there be a default?
o Consensus is that default transfer type should be "unspecified."
o Issue of syntax: Will delimiter be the bang (!) or the semi-colon
(;). John Kunze noted that the bang is a problem with the unix C
shell.
o Consensus is that delimiting character will be semi-colon.

Tim Berners-Lee, Larry Masinter, Mark McCahill and Ned Freed
(ned@innosoft.com) have been tasked to incorporate the required
changes and reply to the list within 2 weeks.

IX. Uniform Resource Characteristics

Michael Mealling has a draft that begins to talk about the Functional
Requirements of URCs. [Erik Huizer asked that it be posted as an
Internet Draft, Michael Mealling has agreed --ed]

Jim Fullton was asked to review the functional specifications for URCs.

o Should they be "characteristic" or "citation"?

o Larry Masinter remarked that the group had this problem with URLs
and URNs because we didn't understand enough about what they
were, and we ended up changing the title of ach of those. He
suggested that we're putting the cart ahead of the horse.

In overview the draft Functional Requirements of the URCs are:

o encapsulation
o structure
o scalability
o grandfathering
o caching
o resolution
o human readable
o transport friendly
o machine consumable

A general discussion was held in the remaining time.

o Mitra noted that a URC needs to be able to differentiate the URL
and URN information from the other meta-information within it,
and you need to know what elements of the meta-information
pertain to which URL within it. Mealling noted that this was
covered under the "Structure" in the document.

Several people expressed concern about URCs as currently debated.

o Keith Moore didn't think you can lump all these things into a
single structure and make it have any meaning. It is non-optimal
at best, and won't work at worst.

o Larry Masinter had a concern about lumping these things together,
especially as far as attachments are concerned. He didn't think
that the encapsulation goes nearly far enough towards deciphering
collections of objects.

o Jim Conklin (conklin@cren.net) had a problem with referencing
URLs within a URC: the URL is going to be too much in flux.

o Clifford Lynch was very troubled by the open-ended nature of these
URCs in the absence of any real concrete usages for them. He can
see a need to encapsulate information in order to be able to pass
it around, but is worried that this enumeration of properties is
too abstract to be useful. He is specifically worried about
little micro-bibliographies being shipped around in these things.
He doesn't think we share any common usage scenarios among the
many of us in this group.

Discussion about whether we can build the framework for how to
handle the data until we understand what the things are that we
want to put into a URC skeleton.

o The Chair noted that there wasn't much interest on the list in
talking about scenarios when it was brought up, but will be again.

o Larry Masinter suggested composing a very general specification,
and proceeding to some scenarios for how these things can be
employed. For example, he would like to deal with file data
formats and have a syntax for talking about that.

o Mitra agreed to repost his previously posted scenarios to the list
as and Internet Draft.

X. New Business

1) Keith Moore wants to look at defining location independent file
names.

o There was consensus that it was important for this group.

2) Karen Sollins noted that the Internet has to start caching or
getting information much closer to the clients. We're starting to
flood the network with queries. She doesn't think that we're putting
the information in the right locations.

o The Chair suggest that this should be discussed at the IIIR
meeting.

3) Considerable amount of overlap between what we're doing and other
groups. Should we schedule joint meetings?

o Erik Huizer we should request that other groups formally monitor
our mailing lists.

4) John Curran suggested that now that we have URL and URNs on the
track, he thinks we need a container for content type specification.
Also, he thinks we need to begin exploring issues related to mapping
services, etc, but is not sure that this is the right group for doing
this.

5) John Kunze said that he I thinks we're trying to use a committee
structure for solving a lot of interoperability problems, and we need a
better structure for providing support to the developers.

XI. Closing Remarks

Given by Erik Huizer.

o The work that this group is doing is getting more and more
important (or rather that more and more people are beginning to
realize how important this work is going to be). He has been
pushing to get the URL and URN specs out, and realizes some people
are feeling put off by this, but believes the group needs to get
more communication out to people that are building applications
with these things already.

o The group lacks a good overview of the architecture. Almost every
person in this room has a different view of what that architecture
should be. So a couple of things that are being discussed:

- start a discussion in the IESG about creating an area for
Internet Integrated Information Architecture activities.

- The IAB holds retreats or workshops a couple of times a year for
invited people. He has applied for an IIIA workshop on this
topic. We can't invite all members of this working group, but
wants to make sure that all members of this community are
represented.