Re: AFS URLs or Why Not File?

Mic Bowman (mic+@transarc.com)
Tue, 22 Nov 1994 12:02:49 -0500 (EST)

Message-Id: <9411221703.AA19214@mocha.bunyip.com>
To: "Daniel W. Connolly" <connolly@hal.com>, uri@bunyip.com
Subject: Re: AFS URLs or Why Not File?
In-Reply-To: Your message of "Fri, 18 Nov 1994 15:37:08 -0600."
<9411182137.AA05713@ulua.hal.com>
Date: Tue, 22 Nov 1994 12:02:49 -0500 (EST)
From: Mic Bowman <mic+@transarc.com>

Let me try to present the problem of access to files in any distributed
file system. Two sites, L1 and L2, share access to a file system name
space. The file system can be any distributed file system including
NFS, AFS, DFS, or any of several other distributed file systems. A
third site, R1, does not share access to the file system. All three
sites want to share access to a corpus of documents stored as files in
the file system. Let file F1 reside on a file server at L1 and file F2
reside on a file server at L2. L1 and L2 use some service---assume ftp
for now---to provide access to the files on their file server to site
R1.

There are two questions that should be addressed:

1) What URL is used by clients at sites L1, L2, and R1 to access the
files F1 and F2?

2) If file F1 refers to file F2, what URL does it use so that clients
at each site can follow the pointer.

Solutions:

1) All sites use FTP urls for all documents.

- This is the only possible solution for the current URL
specification. The 'file' scheme would not allow L1 and
L2 to share access to the file unless the URL were based
on 'localhost'. However, this is an improper use of file
for a resource that is accessible beyond the local machine.

- The problem is that FTP is slow for L1 and L2. If possible
they would prefer to access the files through the file system.

2) L1 and L2 use a file system specific url to access F1, R1 uses FTP to
access F1. F1 uses FTP to refer to F2.

- The problem with 'file' is that file assumes that each host defines
a unique name space. A scheme for wide area file systems (see below)
might identify a name space rather than a host. Alternatively, each
distributed file system might have its own URL scheme. The key is
to use some new scheme that enables L1 and L2 to identify a shared
file system name space.

- It is still necessary for F1 to refer to F2 using the ftp URL since
that is the only protocol common to all three sites that need to
follow the link. Again, L1 and L2 must retrieve F2 through FTP even
though the file is also available in the local file system.

3) All sites use FTP urls for all documents, L1 and L2 share a
configuration file that specifies translations from one scheme to
another.

- Clients use the most general URL to identify a document. The
configuration file specifies the "additional information" necessary
to translate URLs into a preferred form. In this case the
translations shared by L1 and L2 are:

ftp://L1/ftppath/F1 --> file://localhost/filepath/F1
ftp://L2/ftppath/F2 --> file://localhost/filepath/F2

- The advantages are obvious... there is one URL for each file, all
clients can access the files, and the file system clients can
retrieve the files through the preferred access method.

- The disadvantage is maintaining the configuration file. This is
not too difficult since L1 and L2 share a common file system. The
configuration file is maintained on behalf of *all* sites in the file
system. One file is sufficient.

- The translation to the 'file' scheme could actually be a
translation to a file system specific scheme (e.g. afs: or dfs:).

- Another advantage of URL translation is that it has application
beyond distributed file systems. For example, most documents
available through the NCSA http server are also available by FTP
through the afs gateway on grand.central.org. URL translation
would enable a certain amount of load balancing through 'mirrors'
like this.

The third solution is actually the one we use for AFS access right now.
We built a URL translation library to augment the CERN library of common
code. It reads a configuration file that describes the translation from
one name space to another. In our case, all translations are to AFS
file names. It is trivial to extend this solution to provide a
file-system based cache of documents from sites that are not actually
part of the file system. Client-based translation is not a URN
implementation---though it could certainly be used translate URNs to
URLs in the client--- but the configuration file does define URL
equivalency for a particular client.

Having said all that, let me propose two possible URL extensions to
overcome deficiencies in 'file'. The first is a general
wide-area/distributed file system URL. The basic idea is to use
something like file except to generalize from 'host'-specific name space
to 'file-system'-specific name space. (wafs == wide-area file system)

SYNTAX:
wafs://{name-space}/path/file

IMPLEMENTATION:
if the client has access to the name space then
open(path/file);
else
error.

COMMENTS:
This is basically 'file' except that 'host' is generalized to
'name-space'. For example, the following URL identifies an AFS
file name:

wafs://afs/afs/transarc.com/public/www/Home.html

Or a DFS file name:

wafs://dfs/.../mandos.transarc.com/public/README

Or an NFS file name:

wafs://nfs.{mountstring}/home/sunws0/mic/public/html/Home.html

All that is necessary to share files is an agreement on the
name space string and a way to determine the name spaces to which a
client has access.

As an alternative, the single 'wafs' URL type can be replaced by
individual schemes for each name space.

afs://[user@]{cell}/cell-specific-path/file
dfs://[user@]{cell}/cell-specific-path/file
nfs://{mountstring}/path/file

--Mic Bowman

-----------------------------------------------------------------
Mic Bowman
Member of Technical Staff
Transarc Corporation
The Gulf Tower, 707 Grant Street 9903 E. Moccasin Trail
Pittsburgh, PA 15219 Wexford, PA 15090
(412) 338-6752 (412) 933-0073
(412) 338-4404 (FAX)

WWW: ftp://grand.central.org/afs/transarc.com/public/mic/html/Bio.html
-----------------------------------------------------------------