From: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
To: "Daniel W. Connolly" <connolly@hal.com>
Subject: Re: <URL:...> considered harmful
In-Reply-To: Your message of "Mon, 12 Sep 1994 20:14:05 EST."
<9409130114.AA02972@ulua.hal.com>
Date: Mon, 12 Sep 1994 23:20:42 -0500
Message-Id: <16569.779430042@hound.cs.indiana.edu>
Thus wrote: "Daniel W. Connolly"
>In message <199409130042.TAA15490@boombox.micro.umn.edu>, "Mark P. McCahill" w
>>In message <9409122013.AA02543@ulua.hal.com> "Daniel W. Connolly" writes:
>>> URL: looks like a URL scheme, but it's not.
>>
>>It MIGHT look like a scheme if you recognize schemes based only
>>on if there is a word followed by a colon, but that seems like a
>>really unreliable/lame way of recognizing a URL in text...
>
>Would you care to suggest an alternative? I didn't write the code that
>picks URLs out by looking for scheme:... . I just observed that it
>exists and is widely deployed (hypermail, python's urlopen module
>etc.)
Naturally; in the absence of well-established conventions, people use
heuristics and hacks to get by. So?
>>In these examples you are using the whitespace to delimit the URL.
>>This limits the length of the URL to one line which is a real problem
>>for any URLs that are longer than a line. Having a wrapper around the
>>URL does not preclude you from having a program that can parse text to
>>find URLs and make them something the user can double click... and the
>>wrapper makes it possible for even long URLs to be automatically detected
>>and parsed.
>
>If this feature is so valuable, why has it not been implemented and
>deployed by now? URL's have been around for 2 years. Nobody seems
>to need anything more reliable than whitespace to delimit URLs in
>actual practice. If they do need more reliability, they use some other
>format besides plain text.
Therefore, we should not create a more reliable scheme, because if it
needed to be done somebody would have done it already? I don't get
this.
>>> "What about long URLs?" you might ask. Well, they don't work in plain
>>> text. They just don't.
Why not? Can they be made to?
>>They don't unless you have an explicit wrapper so you know when the
>>URL begins and ends. That is what <URL:...> provides.
>
>How is this url: <URL:ftp://cnri.reston.va.us/
>internet-drafts/draft-ietf-uri-url-07.txt> better than
>this one: ftp://cnri.reston.va.us/
>internet-drafts/draft-ietf-uri-url-07.txt ? A human reader will
>understand. Computers? We already discussed
>the nightmarish performance implications of parsers looking for the
>closing '>' since this requires arbitrary backtracking. All that
>aside, my point is that if it were really all that valuable, we'd
>have tools that exploit it by now. We don't. So let's not bloat
>the spec with it.
I think you really mean "if people posting plaintext messages used
such a scheme, we would have tools to exploit it by now." Also, "if
URLs for schema that require lengthy URLs were commonplace, we'd have
tools that worry about it by now."
>OK, but we agree that such programs do not exist, so we have the
>freedom to choose the delimiters. Why not choose the delimiters that
>cause the least grief for implementors and users? Something like
>regular old RFC-822 header syntax:
>
>URL: ftp://cnri.reston.va.us/
> internet-drafts/draft-ietf-uri-url-07.txt
Makes things a bit brittle (e.g. the URL would presumably be broken if
the text were indented, depending how precisely you define it) but
could work...
>or something vaguely lispish:
>
>(URL: ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-07.txt )
Um... OK, I give up... parentheses are better than angle brackets
because why?
>>> It's a tedious, error-prone situation with no widely deployed
>>> solution. Emperical arguments to the contrary are welcome.
Therefore, no attempt to solve it should be made?
>>1.) E-mail addresses and message IDs use <>. However, it is easy to
>> write a program that can differentiate an e-mail address like
>> <connolly@hal.com> from <URL:gopher://gopher.tc.umn.edu/11/fun>.
>
>Again, argument by assertion. "It is easy..." Exactly what criterion
>would you use to differentiate a URL from a message id? Is the following
>a URL or a message id:
>
> <URL:my-scheme://foo.bar/abc@foo.com>
>
>It satisfies the syntax of both, since the path syntax of my-scheme
>might allow '@' characters.
So, the problem is possible spurious URL identification. Are you
seriously arguing that the <URL:...> wrapper has a greater risk of
this than your implicit "use whitespace as delimiters" approach?
-- Marc VanHeyningen <http://www.cs.indiana.edu/hyplan/mvanheyn.html>