Archiving WWW Sites




[Tom Smith's project]

BTN = Better Than Nothing archiving. 

Plan to start with the open-source NEDLIB site-harvester software; think/hope that a combination of: 

  • the files generated by the site harvester 
  • a more structured version of the readme (help from library on this?) 
  • a plan/schedule for regular archiving 

would be a very significant stride in satisfying at least some of our goals with this project.

Click here for the contents of the readme.txt file that accompanied the first backup disk delivered to the Union College Library archives.



[Leo's project]

Click here for release notes on Leo's project.


Objective: to document the use of the Union College web site by capturing on video the use of the site as a user navigates through its features. The video might last up to 30 minutes. It could be accomplished either by simply aiming a digital camera at the screen while a user navigates the site, or perhaps better, capturing the video output of the computer directly to either a DV tape, or a file (although the file would probably be pretty big after 30 minutes). The video would be accompanied by an audio narration, either in real time, or with the audio added after the fact, during an editing process.  Ideally the video "tour" would be shot all in one take, to convey real response times.

Purpose: To convey the structure, look, and method of use of the current Union web site.  The video would give future researchers a glimpse of the content of the web site, and an appreciation for how the site was put together and functioned.

Product: The tour would be saved onto a DVD and/or onto DV tape.  Documentation will be provided on what was done, and instructions for the regular refreshing of the media, and if necessary file types, so that it will be viewable in 100 years or more.

Example:  Using software called Camtasia, exact screens are captured real-time (including mouse movements).  Here are two versions of a short tour; one in *.avi format, and one as a self-running executable.


Trusting the Archiving Project is perhaps not the best route to take.  Fun but hardly foolproof. Click here for some sample retrievals.

Quoting from the site:

The Internet Archive, working with Alexa Internet, has created the Wayback Machine. The Wayback Machine makes it possible to surf more than 10 billion pages stored in the Internet Archive's web archive. The Wayback Machine was unveiled on October 24th, 2001 at U.C. Berkeley's Bancroft Library.

  Return to the Archiving Home Page

Sponsored by Union College and the Center for Educational Technology