URC usage scenarios

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Mon, 26 Sep 1994 11:38:35 -0600

From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Date: Mon, 26 Sep 1994 11:38:35 -0600
Message-Id: <199409261738.LAA24891@idaknow.acl.lanl.gov>
To: connolly@hal.com, masinter@parc.xerox.com
Subject: URC usage scenarios

Larry, Dan -

Here is my first cut at a document detailing a fair number of scenarios of
how the URC service might be used. Your feedback is sought.

Ron Daniel
------------------------------------

URC Service Usage Scenarios
****************************

Abstract
=========

In a mail message to the group
(<http://www.acl.lanl.gov/URI/archive/uri-94q3.messages/291.html>),
Larry Masinter called for scenarios of how the URC service might be used.
Dan Connolly seconded the call and kicked things off with an example of
bibliographic search in
<http://www.acl.lanl.gov/URI/archive/uri-94q3.messages/308.html>.

Here is my stab at responding to their requests. This was scratched out
over a Sunday afternoon, so do not take it as the definitive article on
URC usage. There are cases it does not consider and issues it leaves open
for further discussion. Your feedback is encouraged.

The document is divided into two major sections: User scenarios and
Provider scenarios. The first goes through a few scenarios of how the
service might be used by information consumers. The second is how
information providers would interact with the system. In addition, a
glossary is provided to define terms.

This document is also available as
<http://www.acl.lanl.gov/URI/Scenarios/>

Table of Contents
==================

1. User Scenarios
1. URN to URL resolution
2. Ensuring the veracity of the resource
3. Ensuring the veracity of the URC
4. Bibliographic Search
1. Updating the index
5. Filtering by Seals of Approval
2. Provider Scenarios
1. Publishing a new resource
2. Publishing a new version of a resource
3. Providing an additional location for a resource
1. Mirroring of free information
2. Mirroring of information that is for sale
3. Mirroring on a regular basis
4. Removing a location for a resource
5. Establishing a new publishing authority
6. Dealing with the demise of a publisher
3. Glossary

User Scenarios
===============

URN to URL resolution
++++++++++++++++++++++

This is the main purpose of the URC service. All the other client-side
operations are, IMHO, gravy.

o User provides a URN to the browser by clicking on an anchor or
by entering text into a dialog box.

o Browser connects to the URC service and gives it the URN

o Service returns a (possibly empty) list of locations to the
browser. Each location must contain a URL. It may also contain
information on Content-Type, Cost, Signatures, etc. The list is
unordered. The means by which the service determines this list
is not an appropriate topic for the usage scenarios since it will
be invisible to the user.

o The browser uses user-configurable preferences to order the
list. For example, a user might prefer HTML to PostScript to
text. Another might prefer PDF to HTML. One user might
prefer locations that carried signature information, another
might not care. Most would prefer the cheapest version of a
resource, and the latest version. Estimated network distance is
another means for ordering the selections. If multiple locations
tie, the browser randomizes them in the list to prevent overload
of any one server.

o Once the list of locations has been put into order, the browser
attempts to retrieve the resource from the first location. If that
fails, the next location is tried. This continues until one of the
following is true:

o The browser successfully retrieves the resource
o The list is exhausted
o The user tells the browser to cancel the retrieval

o The browser displays the resource to the user, perhaps with the
aid of an external viewer.

Ensuring the veracity of the resource
++++++++++++++++++++++++++++++++++++++

An important concern voiced over the URI mailing list and in
discussions with different communities of users has been how to ensure
the veracity of a resource. This concern has been raised on both the user
and provider side. Users want to make sure that they are getting what
they are paying for, providers want to make sure that they are not
haunted by bogus versions of a resource. To ensure the veracity of a
resource, the Location information provided by the URC service can
carry a signature of the resource. There are a couple of ways that the
user could have the browser verify the information. One is for the
browser to provide a graphical indication of the fact that a resource has
some signature information. The user then takes an explicit action to
start the verification procedure. Another approach is detailed below.

o The user starts to retrieve a resource according to the first
scenario.

o As the browser is going through its list of Locations, it notes if
the current location has signature information. The rest of this
scenario assumes that we successfully retrieve a resource which
has signature info.

o When the browser retrieves the resource, it displays it to the
user.

o In the background, the browser verifies the signature on the
information.

o If the signature does not check out, the browser alerts the user.

o If the user goes on to another resource before the signature
computation is complete, it is discarded.

This assumes that signatures are computed over the contents of a
complete file. Some resources, such as search services, can not be
treated in such a fashion. To verify services a slightly different
approach is needed. The URC can contain the signature of a constant
header. The service returns that header with its results. The header
contains a public key to use to verify a signature returned at the end of
the information returned to the browser. Discussion on this topic is
highly encouraged.

Ensuring the veracity of the URC
+++++++++++++++++++++++++++++++++

Resources are not the only information that can be tampered with. The
URC service will provide a tempting target for attack. It needs to be
secured against determined attacks and the information it provides
needs to be verifiable. However, security does not come for free, and
we should not impose that cost on all accesses. Therefore it is not
appropriate to make the URC server compute a digital signature for
every query response it generates. Instead, the server keeps two pre
computed signatures for each of its URCs. The first is a signature over
the entire URC, the second computed over the location information it
would return in response to a URN resolution query.

o User configures the browser to verify URC information.

o The user clicks on a link

o The browser sends a URN resolution request to the URC
service. The request has a flag set so that the URC server will
provide digital signature information.

o The browser receives the list of locations as in the first scenario.
In addition it receives a digital signature of that information
which has been encrypted with the private key of the URC
server.

o The browser retrieves the public key of the server, and uses it to
verify the URC information.

o If there is no problem, the browser continues as before to
retrieve the resource. If there is a problem the browser alerts the
user, who should alert the administrator of the URC server.

If a general query is issued, the URCs for all matching resources are
returned in their entirety. The browser then has to verify each of the
URCs in turn. Validating general queries will be an expensive process,
but it is the user's machine paying most of the cost.

Bibliographic Search
+++++++++++++++++++++

The URC provides a convenient place to store bibliographic
information such as author, title, subject, date of publication, etc. The
publication hierarchy makes it possible to find all the registered
publishers. Combining these two properties opens up the possibility of
bibliographic searches across the whole of the web. Exactly how this
should work is not so obvious. At first blush, a scenario like the
following comes to mind:

o User enters author, title, and/or subject information into a form

o Browser passes the query to the URC service.

o Within the URC service, each node is consulted with the query,
the results are collected and passed back to the browser.

o The browser presents the search results to the user.

Of course, the scenario above is unrealistic. If every bibliographic
search of every user consults every URC server, the service as a whole
will soon grind to a halt. The obvious alternative is for some sites to
come forward and carry the burden of these searches, similar to the
current situation with Archie. Most URC servers around the world
would then disallow query forwarding in general bibliographic
searches. This would prevent all the servers from seeing all the traffic.
The usage scenario is now:

o User connects to a URC search site

o Browser puts up the form from that site

o User fills it in and hits "submit".

o The URC search site handles the query over its database and
returns the result to the browser.

o The browser displays the results to the user.

This scenario also has an obvious problem with scale. The web will
certainly grow beyond the capacity of any one site to index. A third
alternative is partway between the two previous suggestions. It relies on
there being multiple URC servers willing to provide the default
information for any publisher. Some sites will provide this information
for many publishers. These major sites could handle the bulk of the
bibliographic searches by forwarding queries amongst themselves. This
is not terribly attractive unless money enters the equation. If publishers
pay major sites to mirror their information and users pay a small
amount for each query they handle then two things happen. It becomes
worthwhile to be a mirror and handle the queries, and the number of
queries will be reduced.

Of course, this money thing is kind of foreign to the Internet - or at
least to its participants in government and educational research labs.
(Damn sure is a stranger to me :-). Also, forwarding the queries amongst a
few big servers is not technically cool. It may be the best we can do, but
I hope not. Something cool needs to be developed here. Suggestions?

Updating the index
-------------------

Unlike the current WWW, the hierarchical nature of the publishers
proposed for URNs means we can determine every registered publisher
so long as publishers are required to remember any sub-publishers they
register.

o Search server starts a depth-first search of the tree of
publishers.

o Search server queries the current server for all URCs that are
new or have been modified since the last time the search server
visited.

o Search server asks for all changes in publication hierarchy since
last visit.

o Search server continues depth-first search using the new
topology

Of course, the URC service will soon grow beyond the capabilities of
any one site to keep a comprehensive index. The natural course is for
collections of servers to cooperate and divide the load by parts of the
publication hierarchy.

There are a couple of requirements for the service which spring from
considering the search service. First, servers will want to be able to
refuse to answer general search queries from unknown users, while still
answering the simple URN->URL query. Second, there will need to be
a means for determining how to contact the server administrator so that
the administrator will add the central search services to the list of
entities that can launch certain queries. Third, a publisher's server must
keep a complete record of all the sub-publishers authorized by that
publisher.

Filtering by Seals of Approval
+++++++++++++++++++++++++++++++

One of the interesting concepts to come out of the Interpedia effort is
the concept of SOAPs (Seals Of APproval). SOAPs are capsule reviews
of a resource and are implemented using digital signature technology so
that they will be extremely difficult to forge. Critics, professional
organizations, etc. could use SOAPs to carry quick reviews of resources
and to point to more elaborate reviews. For example, the IEEE might
receive a request to "publish" a resource in one of their electronic
journals. The editorial board of the journal lines up the requisite
number of reviewers and sends them the URL of the resource. Each of
the reviewers sends their review back to the editors, who either turn the
author down flat, recommend changes, or accept the resource as it is. If
the accept it, they form a digital signature of the resource, the quick
rating, the URN of the optional full rating, etc. all encrypted with the
private key of the particular journal.

Users could use SOAPs to augment bibliographic searches. For
example, a new physics grad student might ask to see all the abstracts of
all the resources dealing with string theory and quantum
chromodynamics which had been reviewed by the American Physics
Society and received a rating of 9 or above.

Such queries do not necessarily need to proceed in the same fashion as
the general bibliographic search described in the earlier section. Instead,
SOAPs may well become the valuable intellectual property of
professional organizations. It may be that if you wish to do searches on
things with the SOAP of the APS, you have to connect to their server to
do it, presumably paying them for the privilege. Given this
money-making potential, it is doubtful that many professional
organizations will allow authors to include a SOAP in the default URC
for their resource unless the author pays for the privilege.

o User connects to the server of a reviewing organization

o Browser displays the search form of that organization. This will
be a typical bibliographic search form augmented with special
features for the SOAPs issued by the organization.

o User fills out the form and submits it.

o The server does the search and returns the results to the browser.

o The browser displays the results of the search to the user.

Of course, there are some times when people will pay for the privilege
of including a SOAP. Consider the following scenario:

o Fiona D. comes home from a day at school. She turns on the TV
and set-top box.

o The set-top box displays the home page that has been set up by
Fiona's parents. This has links for Fiona's favorite shows, as
well as a link to a general movie selection form.

o Fiona decides to watch a movie instead of one of those stupid
ABC after-school specials. She clicks on the movie browser
link.

o The set-top box forwards the query to the cable company. Since
Fiona did not enter her parent's access code, by default the
movies she will be able to choose from all have a MPAA SOAP
of G or PG-13. This level of filtering was sent by the set-top
browser to the URC server at the cable company. The movie
studios pay the MPAA for the rights to provide the SOAP to all
the cable companies.

o She picks a movie from the New Releases page and watches it.

o Her parents get the bill at the end of the month (or perhaps they
set up a default amount that can be spent for household viewing
and Fiona's choices are charged against that).

Provider Scenarios
===================

Publishing a new resource
++++++++++++++++++++++++++

This is one of the fundamental operations for resource providers.
Consequently it needs to be as simple and as bulletproof as possible. I
don't know that the scenario below meets this requirement. Suggestions
are welcomed.

To prepare and test a resource, the authors request development URNs
from their local server. These URNs are hidden from all but the
developers and server administrator. Using the development URNs, the
author(s) prepare and test the new resource. Development URNs will
have very minimal URCs. Typically they will contain one URL and
some access control information.

When the author(s) are ready to publish the resource to the world, they
will modify the access control information in the development URC to
allow wider access. They may also augment the URC with author, title,
publication date, etc. The amount of information needed in the URC of
a published resource will vary from one publisher to another and can be
enforced by the URC software. If the resource is to be verifiable, the
signature of the resource will be put into the URC at this time. Once all
the material for the URC has been provided, a signature can be
computed over it as well.

Once the URC information has been put onto the local URC server, it
will be propagated to any other servers around the globe that can play
the role of default server for that publisher.

Publishing a new version of a resource
+++++++++++++++++++++++++++++++++++++++

When it is time to revise a resource the authors request a development
version of the URC info. This version object will have restricted access
so that ordinary users only see the older version while the authors and
URC server administrator can see the new version. Once the new
version of the resource is ready, the version object in the URC is made
publicly accessible. Locations for the new version are established, and
the locations for the old version should gradually go away.

Providing an additional location for a resource
++++++++++++++++++++++++++++++++++++++++++++++++

One of the main benefits we are looking for from the URC service it
the ability to have multiple locations for a resource. How are these
additional locations to be established? There will be several ways this
might happen, the appropriate model will depend on financial
considerations more than technical ones. We will consider three cases
out of may possible ones. The first is simple mirroring of free
information. The second is a mirror of a small publisher's information
that is sold. The third is a contractual arrangement between sites.

Mirroring of free information
------------------------------

A researcher in Australia comes across a collection of interesting
technical reports on a server in Sweden. He wishes to mirror those
reports as a service to the research community in Australia. He contacts
the administrator of the archive in Sweden. She gives her permission
for a mirror to be established. He pulls over the reports and sets them
up on his HTTP server. Now that they have URLs, he sends a
register_new_url message to the URC service. Since the Swedish
research has provided a digital signature for the URCs of all the
reports, a new location can not just be blindly entered. The URC service
forwards the request to the Swedish researcher. She checks out the new
URLs to make sure that they are faithful versions of her reports, then
signs the register_new_url message with her private key before sending
it back to the URC service. The service verifies the authentication
information, sees that it is good, adds the new location to the URC of
each report and recalculates the signature information. Now when users
attempt to resolve the URN, they can fetch it from either Australia or
Sweden. As a matter of courtesy, the Australian researcher periodically
informs the Swedish research about how many times her reports have
been accessed.

Mirroring of information that is for sale
------------------------------------------

An experimental film maker has been selling avant-garde videos over
the WWW. A film distributor in Germany contacts him to see if they
can serve these up to the European market in exchange for a cut of the
action. They contact the film maker who says "sure". The German
distributor puts copies of the videos onto their server and attempts to
register the new location with the URC service. The service forwards
the request to the film maker, who authorizes it by signing the request
with his private key. Periodically the Germans send the film maker a
check to cover the royalties the film maker collects from every
download from the German server. As part of the contract between the
two sites, the film maker can access the logs on the German server to
make sure that he is being paid for all the copies it provides. These logs
are an obvious point of attack for an unscrupulous mirror site
administrator, so there needs to be a secure means for keeping them in
order to answer accusations of fraud. Suggestions?

Mirroring on a regular basis
-----------------------------

Some large sites may set up cooperative mirroring agreements. For
example, Los Alamos National Laboratory might make arrangements with
CERN to provide mirrors of each others work. When either of these sites
publishes a new resource, it sends a message to the other. The second
site fetches the resource and puts it on their server. It then issues a
register_new_url message to the URC service. It is forwarded to the
publisher of the resource, where it is automatically approved without
human intervention.

Removing a location for a resource
+++++++++++++++++++++++++++++++++++

One of the other strong motivations for the URC service is to allow
administrators of collections of information to rearrange their
collections without breaking pages across the globe. Moving resources
could be accomplished in two steps - establishing the new location then
deleting the original location. Deletion is also necessary when we wish
to remove a resource for any reason.

o Administrator of a resource sends a delete_location message to
the URC server. This will typically require authentication that
is provided through digital signature means.

o The URC service authenticates the request. If the issuer of the
request has permission, the URC is searched for the specified
location. If found, it is removed and a new signature for the
URC is computed.

o If there are other URC servers providing default information for
the particular publisher, they are notified as well so that they
may also modify their databases.

Establishing a new publishing authority
++++++++++++++++++++++++++++++++++++++++

Publishers are arranged in a hierarchy where new publishers can be
added as children of existing ones.

o Billy Bob Riker, Harley rider, decides to publish his doggerel to
the world. He contacts his friendly neighborhood web publisher
who, in exchange for a modest amount of cash, registers Billy
Bob as a publisher by issuing the following request to the URC
service, signed with the private key of the publisher:

o Register_new_publisher(parent_publisher,
name_of_new_publisher, signature_of_request)

o Since Billy Bob's nomadic life style is a little hard on disk
drives, he contracts with the parent publisher to also provide
storage, http service, and URC service. These are private
business dealings between the two parties and do not especially
concern us.

o Billy Bob's prose is published to the world using the operations
described earlier.

Dealing with the demise of a publisher
+++++++++++++++++++++++++++++++++++++++

Poor Billy Bob. The market for Harley doggerel was not enough to
cover his yearly storage fees and his service provider is about to evict
his bits. While the parent publisher will never reassign Billy Bob's
publication company name, no one is paying for the machine resources,
so it is time to remove the URLs from the URCs for Billy Bob's
resources.

Accomplishing this is a tricky question. Billy will have signed the URC
elements with his own private key, and he is not about to go along
willingly with the eviction proceedings. Of course, he is not the
administrator of the HTTP and URC servers. The administrator of the
servers simply clobbers the old URC and replaces it with one that
contains a "no longer available" element. It can't be signed with Billy
Bob's key, but so what? Once the new URC is in place, the resources
are deleted from the HTTP server.

Glossary
========

Default URC
The URC that is provided by the publisher of a resource.
Local URC server
The URC server that a users browser is configured to connect to
as a first resort.
Development URN
A URN used while developing a resource. It starts with very
tight access controls so that only the resource developers and the
server administrator can see the URC information and resolve
the URN to a URL. The access controls can be eased later.
value-added URC server
A server that provides more than just the default information on
a resource. Servers run by professional organizations that
provide SOAPs are one example, servers that keep full-text
indices or n-grams of text in order to offer greater search
capabilities are another.
SOAP
Seal Of APproval