Global Programs

> Literature references and annotations by Dick Grune, dick@dickgrune.com.
Last update: Mon Sep 14 13:33:32 2009.

These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
> No guarantees given; comments and content criticism welcome.

* C. Mic Bowman, Peter B. Danzig, Michael F. Schwartz, Research problems for scalable Internet resource discovery, CU-CS-643-93, University of Colorado at Boulder, Boulder, Col., pp. 12. March 1993,
Three problems in resource discovery are identified:
     1. data diversity: the information is there, but is in different formats. Tentative solution: operation mapping or data mapping.
     2. number of users: too much access to resource information. Tentative solution: reduplication of the hot stuff only.
     3. data volume: there is one terabyte already, and users searching for information are flooded with false hits. Tentative solution: Use better indexes, but they get too big; then use indexes over indexes.

* Alan Emtage, Peter Deutsch, archie -- An electronic directory server for the Internet, TR-92-??, McGill University, Montreal, Canada, 1992?,
archie (no capital) consists of a Data Gathering Component, a Database Management Component, and a User Access Component. It is replicated a small number of times.
     Scalability: 1. ftp site managers should restrict replication of files, since the size of the database is the sum of all `number of user files' µ `number of places it occurs'. 2. uniformity of file systems would help.
     Effectiveness: 1. file names are not a good indicator of contents; there should be a contents description.

* B. Clifford Neuman, Prospero: a tool for organizing Internet resources, Electronic Networking: Research, Application, and Policy, vol. 2, #1, pp. ??-??. 1992????,
Prospero is a world-wide file system akin to Andrew, with the possibility for users to impose their own (virtual) (tree-like) structure. A user's structure can be made public, in which case it can be part of somebody else's virtual structure.

* Matt Blaze, Rafael Alonso, Dynamic hierarchical caching in large-scale distributed file systems, CS-TR-353-91, Princeton University, Princeton, pp. 9. Oct. 1991,
Simple hierarchical caching is no good, since if a file is not in the local cache, chances are large that it is not in the intermediate caches either. A $DELTA$-ary cache tree is proposed, with the following access algorithm: if the file is not in the local cache, consult the owner who, if he has no $DELTA$ customers yet, will send you the file and attach you to the tree as a sub-cacher, or refer you to a sub-cacher.

* David R. Cheriton, Timothy P. Mann, Decentralizing a global naming service for improved performance and fault tolerance, TOCS, vol. 7, #2, pp. 147-183. May 1989,
In this long-winded paper, three levels of naming are distinguished: global (com, edu, gov, etc.), which are uncontrolled and must be highly available; administrational (stanford.dsg, cs.vu, etc.) which are owned by organizations; and managerial (bin, lib, smith, etc.), which are owned by single object managers. Each client caches a number of prefixes of names and uses the longest to map to an object address, or to those of its liaison servers. Each object knows its name and checks this when handling requests, so the chace entries may be wrong. If all cache entries are wrong local multicast is used to re-establish a reasonable cache. The rest of the paper does calculations on the model and compares them to measurements. Security is covered superficially.

* Michael D. Schroeder, Andrew D. Birrell, Roger M. Needham, Experience with Grapevine: the growth of a distributed system, ACM TOCS, vol. 2, #1, pp. 3-23. Feb. 1984,
Some problems are enumerated which developed after the system had been in place for some time.
     Scaling: Distribution lists soon got too big, and structured distribution lists were introduced.
     Too much abstraction/transparency: when something fails, it just hangs and it is hard to find out what is going on. Monitoring software has been added.
     Unintended use: the system was soon used as a general data storage mechanism.
     Fossilization: as the system aged, fewer and fewer people knew the details, and nobody was willing any more to take the risks of a major modification.

* Andrew D. Birrell, Roy Levin, Roger M. Needham, Michael D. Schroeder, Grapevine: an exercise in distributed computing, Commun. ACM, vol. 25, #4, pp. 260-274. April 1982,
Grapevine is a distributed message delivery system on the Xerox research internet. It features naming facilities for such entities as people, machines, services and groups of these; one-way authentication of people and machines; and location of services. A problem was `coordinated software unreliability': all replicas crash simultaneously on the same bug.