Re: (long) sketch of proposed imap: URL syntax and semantics [TRY AGAIN!]

Steven D. Majewski (sdm7g@elvis.med.virginia.edu)
Wed, 6 Jul 1994 01:26:15 -0400

Date: Wed, 6 Jul 1994 01:26:15 -0400
From: "Steven D. Majewski" <sdm7g@elvis.med.virginia.edu>
Message-Id: <199407060526.AA14897@elvis.med.Virginia.EDU>
To: John Gardiner Myers <jgm+@cmu.edu>, imap@cac.washington.edu
Subject: Re: (long) sketch of proposed imap: URL syntax and semantics [TRY AGAIN!]

On Jul 5, 22:51, John Gardiner Myers wrote:
>
> >( But it should be
> > something less raw than just the S-expr returned by FETCH BODY or
> > BODY.STRUCTURE. [...] something humanly readable.)
>
> Why? You're redesigning the IMAP protocol here.
>

I'm NOT redesigning the IMAP protocol.

I'm NOT suggesting any changes to the IMAP protocol. ( except,
perhaps, in the very limited sense of pointing out some language
in the draft that may be ambiguous. Since there are paragraphs
that you seem to read quite differently than I - they perhaps
ARE ambiguous, or at least, not as clear as they could be. But
I'm not asking for a change, just a clarification. If that wasn't
clear, then it's my naturally argumentative style that's to blame. :-)

I'm NOT proposing any change in server implementation at all.

I'm proposing a mapping of an imap: URL *onto* the protocol
that will enable a client program to construct a sequence
of IMAP commands to retrieve an object given a URL.

And, I'm proposing that if use of imap: URL's spread to anyplace
other than my gateway, then it would be useful if interactive IMAP
clients had a "goto <URL>" capability.

It is important to remember that it is not the purpose of a
URL to have a one-to-one mapping onto a particlar protocol.
The URL should map onto the OBJECT. Some URL's require
protocol specific information to be able to retrieve the
object - for example, the ftp scheme has added a ;TYPE
qualifier - not out of any desire to map the entire FTP
protocol onto the URL, but because the object may not be
able to be correctly retrieved by the protocol without
knowledge of the type.

And in another message:
>
> You're making a whole lot of assumptions here.
>

Yes. Well - I know what my requirements are.
The question is whether the mapping that I will implement
is acceptable for a standard.

> It is possible for multiple messages in a mailbox to have the same
> message-id.

mid: SHOULD eventually be the mid: scheme that is mentioned but not actually
defined in some of the URL documents. I assume that this will be
constructed from "Message_id:" and "Date:" message header lines to
make a unique reference. Using the golden internet protocol rule,
I'll generate those (when they are defined), but I'll try to accept plain
ordinary "Message_id:"'s if they do happen to be unique in that
folder.

> The semantics of "#mid:<message-id>" is a search. It should have the
> URL syntax of a search.

Again: the purpose of the URL is not to be a one to one mapping of
the protocol. The function of both are quite different. There is no
reason for the semantics to be identical.

> > It is allowed for Multipart MIME messages to return the parts in some
> > symbolic form than requires further dereferencing.
>
> If you want the symbolic form of a multpart MIME message, you should
> have to ask for it explicitly. If you just ask for a message, you
> should get the text of the message, be it multipart or not.
>

This is one of the rather arbitrary decision I threw in there for
bait ( I mean, "comment" :-). If there is a consensus, I'll follow.

However, I don't think you actually want to read the Base64 encoding
of a audio file or a Gif inline. And, although a HTTP client/server
can inquire on capabilities to handle multipart messages, that
arbitration is not part of the imap protocol. In fact, current IMAP
are not required to display the entire message. Pine (I think) only
displays the first part, and displays an indicator to the other parts.
However, if we have to pick something arbitrary, perhaps a better
convention would be to display all Content-type: text and display
references to other parts.

[ I don't actually know that this behaviour is required to be defined,
though. ( That's why I said "may" ) As I pointed out, presentation is
NOT defined for an IMAP client by the protocol. ]

> > I would then interpret the "hierarcy delimiter, if it has one" to
> > mean that "/" in a mailbox name does not need to be escaped.
>
> I would suggest treating the last "/" (or "#" or "?", depending on
> definition) as separating the mailbox name from the "selector". Leave
> all characters in the mailbox name as uninterpreted.
>
> IMAP mailboxes are also likely to have a "#" character in the name.
>

I think the folks working on the uri draft would say that a "#" in
a mailbox name would HAVE to be escaped. "?" *within* a search string,
as opposed to one *indicating* (delimiting) a search string would
DEFINITELY have to be escaped. IF "/" *IS* defined as a "hierarchy
delimiter" between mailbox and message-selector, THEN it must be
escaped when it's part of the mailbox name.

The above is my reading of the *direction* of consensus. Early drafts
seem to state clearly that everything after the scheme: is opaque.
Later drafts seem to hedge and qualify this. Recent discussion on
uri@bunyip.com seems to be favoring making more of the syntax globally
defined. ( e.g. "?" ALWAYS indicates a query string. )

[ My reading may be off. That's the main reason I've been CC-ing
uri@bunyip.com - and to throw in an example of attempting to
follow their guidelines for a protocol they didn't originally
consider when writing them. ]

>
> > So my answer is basically
> > the same I made to John Gardiner Myers concerning UID's:
> > #<uid>#<body-part>
> > will be the preferred way of specifying a message part, especially
> > when a server/gateway constructs a reference.
>
> Your syntax, however, makes "message-number" the preferred form.
>

I'm not entirely happy with that myself -
Another arbitrary decision due mostly to the fact that mid:, cid:
are already mentioned ( if not well defined ) in the URL documents,
and uid: seemed an obvious addition, since that acronym is used in
the IMAP document. Sequence number became the default for lack of
a commonly accepted and previously published acronym. Not a good
reason, I admit.

> Message numbers are explicitly allowed to change between sessions.
> They only appear to remain constant in certain limited cases. Their
> use is dangerous-- if/when they do change, someone using a stale
> "message-number" reference will likely get the wrong message, not an
> indication that the reference is invalid.

URL's, in general, have no guaranteed lifetime and don't guarantee
retrieval of the same object for all time.

> Message numbers in IMAP URL's are analogous to FTP URL's which specify
> "give me the third file in the listing for this directory".

But, yes. The fact that a sequence number will usually get SOMETHING
(your analogy is on target! ) make them more error prone than other
"volatile" URL's.

But I still think they are acceptable with the proper caveats. And if
it's a URL into *my* archive, and I guarantee that messages will only
be appended to that archive, and that I happen to know that my server,
given that restriction, will map it to the same message, then it's not
too unreasonable to generate them from my gateway.

But: I would prefer UID's in all cases and avoid sequence numbers
entirely, except for:

> > ( Besides the practical problem that the U.Washington imap2bis
> > server does not seem to support UID
>
> The IMAP4 document was only recently completed, not everyone has had
> time to implement it. Within a year, servers which do not support UID
> will be considered obsolete.
>

I'm going to drop things half finished and go to the Outer Banks for
two weeks this month, but I still expect to have a working gateway
by Sept. ( maybe not the whole thing, but enough for my immediate
needs. I'll may leave MIME support until the server has better MIME
support. )

However, if you can guarantee obsoleteness, then why bother with all
that effort towards IMAP2/IMAP2bis/IMAP4 compatability ?

I would, though, consider the fact that *I* need sequence numbers
right *NOW* as a weak argument for writing them into a standard, if it
weren't also for the fact that clients don't typically present UID's
to the user. So it is unlikey that someone trying to construct a URL
to reference a message could do so if UID's were the required and only
message selector.

Sequence number and Message-id: *are* usually available and viewable.

I think it's a lot more productive to focus on what an imap: URL
is required to do, and then try to map that functionality onto
both the URL syntax and the IMAP protocol.

-- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU> --
-- UVA Department of Molecular Physiology and Biological Physics --
-- Box 449 Health Science Center Charlottesville,VA 22908 --