Re: caching

Alexsander Totic (atotic@ncsa.uiuc.edu)
Fri, 18 Feb 1994 01:19:22 -0600 (CST)

From: atotic@ncsa.uiuc.edu (Alexsander Totic)
Message-Id: <9402180719.AA11617@void.ncsa.uiuc.edu>
Subject: Re: caching
To: uri@bunyip.com
Date: Fri, 18 Feb 1994 01:19:22 -0600 (CST)

Prompted by recent postings, here are some of my
thoughts on caching. This is a perspective of a WWW browser user.

There seem to be two distinct modes of information retreival going on:

1) Searching.

Here the user starts with metainformation, and proceeds to look for
that piece which best satisfies his needs.

/ URC\ /URL+format\
Search->| URC |->user picks->URN->|URL+format|->user/client picks 1->doc
\ URC/ \URL+format/

In this scenarion, it makes sense to allow the user to make a choice,
since he is looking at information with a particular purpose in mind.
For example, on a Macintosh, I would prefer a text file if I
want to search/incorporate the doc, and postscript if I want to print
it. There is no way for client to determine the best match in the
general case.

2) Browsing. I am using a WWW paradigm here.

/URL+format\
user clicks on a link->URN->|URL+format|->user/client picks one->doc
\URL+format/

In this scenario, user is interested in few things: he should
be able to display the document, and speed, speed, speed.
I beleive that client should be able to make a choice for the
user about what URL should be retreived, because naive user does not
have the knowledge needed to make the right decision (he does not
know the network topology and addressing very well). If the user
is interested in finding more about the resource, he can get
the URC, and find more info.

So, how does all this relate to caching?
I have been thinking about how would caching work with URNs for
a while. Few things seem to be very useful:

- Local caching: Ability to have domain-specific cache server.
For example, for all the URN users in domain psych.uiuc.edu
should have a local server that caches the documents that have
been accessed from this domain. Outside world should not know that these
documents have been cached. This is useful as coworkers start shouting
to each other: "Check out the new movie at ucla.edu!" or
"This fractal art is really something."
- Mirroring: Ability to have worldwide mirror sites for documents.
This releives the network load accross the continents.
These URLs should be registered worldwide. This function is really
mirroring, and not caching.

For both caching and mirroring, caching schemes that will work can
be imagined under some circumstances:

Local caching:

Can be implemented together with a local URN->URL server.
The assumptions are:
- If I have resolved a URN more than x times in the
last hour, it makes sense to try to cache this document on a server.
- When a document is retreived from the original site, there is a way
to determine what its caching lifespan should be. Each protocol would
have its own method for determining this. HTTP has header info,
FTP site can be periodically checked via archie.
- Browsing client, if picking a URL automatically, will always pick
the topmost suitable one from the list.
The way it would work:
When a single URN is resolved for the second time in one hour, a decision
is made to cache it on the resolving server (RS). The RS generates a
unique local URL for this document, and retreives it from its original site.
Its expiration time is determined in protocol-dependent way.
If the document is labeled variable, a note is made, and RS will not attempt
to cache this URN any more.
RS returns the list URLs to the client, of which the topmost in the
newly generated local URL. The client, if in browsing mode, picks the
topmost one and retreives the doc from the RS server. Note that the
client still has an option to retreive other versions. The future requests
for resolving that URN would resolve to the same URL list. Hopefully,
the other local requests would be looking for the same kind of document.
One weakness is that server has to guess which URL would browser request.
This could be cured with some fancy redirection, so that server retreives
all the URLs that are the result of that particular URN query.

Mirroring:
This problem has been thoroughly investigated by others, so I will not
comment upon it right now. (Archie, some other model). We can come up
with many different schemes, and I think others are better qualified to
comment upon it.

So, local caching will help with browsing, where speed is the main
concern. Mirroring would help lighten up the loads resulting from
searching.

Aleks

-- 
Aleksandar Totic   -- lead MacMosaic programmer --         atotic@ncsa.uiuc.edu
Software Development Group      National Center for Supercomputing Applications