Skip to Content Java Solaris Communities Partners My Sun Sun Store United States Worldwide

»  Bienvenue Sun Labs Europe
»  Beyond Firewalls: Public Utility Computing for Private Networks
»  Brazil Project: The Future of Web Application Development
Paperless Publishing with a Twist: It May Work
»  Online Privacy: Taking it Personally
»  It's Come to This - http://www.myfridge/check_ice.html
»  Racing Toward the Future...
Feature Story

Paperless Publishing with a Twist: It May Work

HighWire Press Assistant Director Vicky Reich HighWire Press Assistant Director Vicky Reich supporting "slow, beat-up" LOCKSS hardware.

Sun Microsystems Laboratories and Stanford University are testing a new, real-world approach to preserving access to scientific, technical, and medical journals. It promises to help libraries better exploit Web technologies. In the bargain, it may make good on another promise: using information technology to reduce the flow of paper worldwide.

Using Java[tm] and Linux technologies, Stanford Library and Sun Microsystems Laboratories researchers have adapted a centuries-old model for circulating paper to create one that reduces our reliance on it.

It happens millions of times a day all over the wired world: Search the web. Find a URL. Go there. Bookmark it, ...and then print hardcopy.

A commonplace event at work and home, printing paper is a habit we can't seem to kick. Its persistence has surprised and stumped analysts who once boldly imagined the Paperless Office.

In fact, office and home printing is driving a boom in demand for paper, one that is predicted to double office paper consumption by the year 2003. Two recent workplace surveys confirm the trend and suggest an unlikely culprit: the Web itself.

The surveys found that information workers are increasing their Internet printing volume at home and at the office. The bottom line: we print hardcopy because we are concerned that the information may not be on the Internet the next time that we need it.

Our concern is well founded. The impermanence of Web content is a reliable fact of digital life that affects everyone. It is also the focus of a unique collaboration between Stanford University Library's Vicky Reich and Sun Microsystems Laboratories' David Rosenthal. Their goal is to provide libraries with reliable, persistent access to on-line journals. Using Java technology, Reich and Rosenthal have adapted a centuries-old model for circulating paper to one that reduces our reliance on it.

Lots of Copies Keeps Stuff Safe

This month, with a grant from the National Science Foundation and support from Sun Microsystems, they began alpha testing their solution at six libraries from Harvard and Columbia to Stanford and Berkeley. The content is Science, a magazine published by American Association for the Advancement of Science (AAAS). Reich and Rosenthal call their new system LOCKSS, for Lots of Copies Keeps Stuff Safe, a name that belies the inventiveness of their approach.

The Web: Fueling Demand for Paper?

It's no secret that the Information Age has been a boon to the paper pushing industries. It comes as something of a surprise, though, that the Internet has actually increased demand for paper. Pre-cut paper consumption will double from 1996 to 2003 because of the use of office and home printers, according to the Boston Consulting Group, a management consultant that studied the impact of electronic media on paper use. Separate workplace surveys confirm and help explain that trend.

According to the first two Internet Printing Index (IPI) surveys, the Web has actually increased the demand for printing in the workplace. "Ironically, the medium that was predicted to kill print and paper is, at present, fueling it," observed the [American Printer, Feb 2000 v224 i5 p42]. The printing trade journal noted approvingly, "[i]t's a fine time to be selling paper in the U.S."

The surveys were conducted late last year by Market Tools, Inc., and commissioned by Hewlett-Packard Co., a leading printer manufacturer. They found that regular users of online information are printing 33 pages per day from the Internet per workday, either for their personal files or to share with colleagues. That was a five-page-per-day increase from the first IPI survey. And it amounts to over 3 reams of paper per worker per year, a total that does not include printing in-house documents. Yet the surveys revealed that people click their print buttons for practical reasons. Chief among them: the concern that the information might be gone the next time they surfed the URL.

LOCKSS is an open source, JavaTM and Linux-based distributed system. It is designed to operate on slow, inexpensive hardware without central administration. Running autonomously and deploying a clever system of polling, LOCKSS permanently caches copies of on-line content -- enough copies to assure access around the world in case publishers fold or no longer support user access. So when an issue of an on-line journal is misplaced or damaged, LOCKSS takes notice and replaces it.

Reich and Rosenthal's immediate target is libraries, where the advances of on-line publishing have been held back by libraries' reluctance to subscribe exclusively to on-line editions of journals.

For Reich, assistant director of Stanford Library's Highwire Press, the attraction is irresistible. The HighWire Press publishes the on-line editions of approximately 210 STM journals, publishing a new page every few seconds, 24x7. Reich and her colleagues have championed the move away from paper, developing user-friendly techniques that STM audiences take for granted. These, combined with hyperlinks to related articles, bibliographies, footnotes and improved searchability make the Web versions easier and faster to access and more useful than paper editions. Many on-line STM journals now publish earlier and contain more information than their paper editions.

On the one hand, libraries such as Stanford's are eager to provide online access to scientific, technical and medical (STM) journals because the Web is a far more effective medium than paper. On the other hand, what happens when the on-line publisher fails, or arbitrarily decides to deny access to its archives? "Preservation is totally at the whim of the publisher," notes Rosenthal. The publisher may promise 'perpetual access,' but there is no business model to support the promise."

"Paper does have one essential property the Web lacks, permanence," observes Reich. LOCKSS figures to change that with an approach that combines the advantages of on-line publishing with the centuries-old craft of library management.

Affordable Web Cache

"Librarians' technique for preserving access to material published on paper has been honed over the years since 415 AD, when much of the world's literature was lost in the destruction of the Library of Alexandria," Reich and Rosenthal observe in a paper to be presented at Usenix in June. The" fundamental requirement" for LOCKSS was to model the best library techniques as closely as possible for material published on the Web.

A comparable system might have saved much of the world's literature lost in the fire that destroyed the Library of Alexandria in 415 AD.

Those techniques are based on simple rules. Acquire lots of copies. Scatter them around the world so that is its easy to find some of them and hard to find all of them. Lend or copy your copies when other libraries need them. And collaborate only with competent and trusted libraries. These are the design principles that LOCKSS implements, with a further proviso that it runs on cheap, slow, old computers "stolen from the junk heap," says Reich.

Unlike archival systems that preserve copies at any cost, LOCKSS preserves access for circulating journals with a frugality that will make it affordable to perennially cash-strapped libraries. For the alpha test now underway, "we're using really old, beat-up 75 and 100 MHz Pentiums," says Rosenthal.

A sophisticated polling technique and a unique security system complements the modest hardware requirements.

Each participating library behaves as a Web cache. The process begins when a librarian supplies an instance of LOCKSS with a publisher's URL and publishing frequency. The publisher uses the library's IP address for authentication, and LOCKSS then launches a web crawler that navigates and traverses the publisher's sub-trees, fetching a copy of the journal page by page.

LOCKSS in a Matchbox: This pint-size Linux PC runs LOCKSS and features a 486 processor with 16 MB of RAM, 16 MB of flash RAM, a 340MB IBM Microdrive, and a full set of PC I/O. This system is from Tiqit http://www.tiqit.com, a firm started by Sun Microsystem alumnus Vaughan Pratt, currently on teaching leave at Stanford University.

The library caches communicate with each other "in wall-clock time," using the Library Cache Auditing Protocol (LCAP), which Reich and Rosenthal created. LCAP is a reliable, scalable IP multicast protocol that continuously polls member libraries to check for missing or damaged copies. LCAP takes advantage of multi-threaded Java code, so a variety of processes can run in the background. Among them, random polls in which the caches run "diffs" on their respective copies, comparing content by walking through directories by journal, volume, and issue. Damaged or missing copies trigger a replication process modeled on the inter-library loan system, but only after LOCKSS conducts an "opinion poll" on the competency of the library with the problem cache.

Perhaps the most intriguing part of the security system is that is designed to leverage the unique characteristics of the LOCKSS system. Because LOCKSS is not centrally administered but rather distributed, there is no single point of failure. Because LOCKSS runs very slowly it means that an attacker "must persist in taking bad actions over a long period of time," according to Reich and Rosenthal. "By operating slowly even on human timescales, the system makes it easier to detect an attacker and limits the damage he can do before being stopped."

The LOCKSS security system is forgiving, too, which is remarkable for an autonomous caching system. For this it relies on maintaining a record of public behavior. Since each cache maintains a registry of every other cache's polling behavior, mistrusted caches are eventually excluded from polling, copying, and lending operations. If the mistrusted cache changes its ways, demonstrating its reliability in a sufficient number of polls over time, it is readmitted to the peer group and then granted voting and lending privileges in the LOCKSS system.

Reich and Rosenthal are quick to point out that theirs is not "a general-purpose Web content preservation system." LOCKSS is designed only for journals published by Stanford's Highwire press. To be sure, LOCKSS slow, methodical polling and copying system is "clearly not suitable for volatile content" such as that of a CNN news site. But Reich and Rosenthal do allow that "[i]t may be possible to apply the system to other types of content." Its affordability and ease of use are promising.

Adds Reich, "It certainly will reduce the need for paper." And that's a promise that LOCKSS and similarly designed systems may keep.


Related Links


What It Means To You

Executives Service Providers Developers System Admins
LOCKSS testing may demonstrate a low-cost alternative to saving paper for enterprise computing. LOCKSS testing could show how to use IP multicasting preserve access to Intra and Internet content. LOCKSS demonstrates an approach to IP multicasting and security on slow, inexpensive distributed systems. LOCKSS uses a very low maintenance, distributed approach to achieve reliable, secure access to Web content.
Investors
LOCKSS may prove a promising technology for licensing to libraries and government institutions.
Would you recommend this Sun site to a friend or colleague?
Contact About Sun News Employment Privacy Terms of Use Trademarks Copyright 1994-2008 Sun Microsystems, Inc.