NOTE: Several members of the Web Archiving project attended the DSpace workshop run by NERCOMP on April 30. 
Here is Leo Geoffrion's report on that workshop.  - JDK


Date: Mon, 5 May 2003 10:15:25 -0400
Subject: Report on DSpace workshop
From: ldg <ldg@skidmore.edu>  [Leo D. Geoffrion]

These are some key ideas gathered from the workshop [on DSpace, sponsored by NERCOMP, April 30, 2003] without repeating the lectures themselves.

In their design for DSpace, they felt it very important to give as much local control as possible. They found that offices feel strongly about the ability to establish an identity for each individual laboratory and project. Hence, the community feature is an important planning element when working with offices.

  • If we were to roll out DSpace for our schools, we should think about the use of logos and customized spaces as a way to appeal to distinct groups within our schools.

DSpace began as a depository for faculty and researcher electronic publications, but their conversations with faculty quickly indicated that it needed a broader mission. While it continues to be used for its primary function, the faculty were particularly interested in its ability to archive course materials and learning objects. As a result, they are now working with MIT's OKI (Open Knowledge Initiative) to strengthen its relevance for archiving educational components in a manner that will integrate with the learning modules themselves.

  • This could be a big plus for our schools, where teaching is a more central function than at the research universities.

Faculty also expressed much interest in DSpace as a repository for unpublished non-text items such as large research datasets, art images, ...

In their first 6 months of public operation, they have 5 units at MIT using it, with another 12 about to start in the coming months. They've had to invest considerable time and work in marketing it to the academic community. My impression is that it's a matter of making it a priority in their busy lives -- not one of active resistance from the scholarly community.

They report that people have little trouble entering items and do a very thorough job adding metatags. They've actually been more verbose than expected. Most of the submissions are being done by surrogates (e.g. graduate assistants, ... ) instead of the professors themselves.

Intellectual property ownership is a major worry underlying the DSpace project. It finesses the issue by having contributors state that they own the copyright or have obtained copyright permission for its placement in the archive. At the same time, they recognize that this is an inadequate approach long term.

As one response, the MIT lawyers are starting to advise the faculty on negotiating rights. For example, they have offered sample language for publication agreements that gives them the right to place a copy in the institutional archive (DSTORE) and distribute it online.

  • This could be a useful lesson for our schools.

They are also working with Creative Commons ideas to define new classes of copyright permission and are considering whether to use their position to serve as a broker service for scholars on their campus. Essentially, they might take over copyright communications for individual professors on their campuses since many find it burdensome to respond to the many requests for copyright permission that they receive.

  • (somehow this sounds like a much more serious problem at MIT than on our campuses)

They are strongly committed to the Open-Source shared development model and are asking for as much help as possible to contribute to its future evolution. Some development projects at other schools include:

  • developing an ldap authentication module (Columbia??)
  • developing a Kerberos authentication module (Cornell)
  • authority control for key fields. For example, making certain that correct names are used for institutional authors.
  • porting it to other databases besides Postgres. (Columbia)
  • improved documentation (DSpace for Dummies at UTenn.)

They are moving to surrender control from MIT to the OpenSource community using SourceForge as the primary vehicle.

Some future enhancements that are in various stages of consideration/development:

  • Integration with library catalog to allow DSpace documents to appear in the catalog as individual items automatically.
  • Better methods to mirror contents among clusters of DSpace servers. At present, this can be done via Perl scripts but it's clunky and in no way "real time"
  • Scalability. They are moving toward archives that can hold millions of documents and petabytes of data. This leads to the need for tools to manage the size of submissions.
  • The use of metatag structures beyond Dublin Core. How can one substitute a different tagging structure and still maintain good search/retrieval ability.
  • Better searching, including full-text searching.

There is interest in expanding it to university archives, but this is a ways off. First, they want to focus on services outside of the library before applying it internally, and second, they've not yet worked through the issues between closed and open archives. The present DSpace is not really intended for closed archives with tight restrictions on who can access which pieces of College records.

In moving from research to production, MIT made an institutional decision that it will support DSpace in perpetuity. That is, it will maintain the documents even if the people leave, technology changes, ... On the other hand, they only guarantee bitstream retrieval. They do not guarantee that you'll be able to understand the binary information. At the same time, they are working to develop a repository of information on bit-level definitions of document formats (e.g. jpeg, pdf, ....). Long term, they hope to make this a key Internet resource depository. Their premise is that this will provide the necessary infrastructure to permit the future development of translators for obsolete formats.

  • As we move into web archiving, are our schools prepared to commit to maintaining the archives into the indeterminate future?!

As part of this institutional commitment, they are exploring methods to generate revenue. One option is to offer "premium" services such as help with metadata tagging. Another is to charge a fee to document submitters, ...

MIT's operating policies have made it very easy to deposit documents, and nearly impossible to remove them. They feel it important that items never disappear from the archive.

Another promising future.... Schools are beginning to create federations to share archiving across schools. They mentioned two examples: UToronto offering services to other Ontario Colleges and Hamilton College offering services to Union and Skidmore.

-------------------

My general impression...

DSpace is still an emerging product, but the core group is enthusiastic and they are evolving it in very promising directions. I believe that it's safe for us to jump on the train while recognizing that parts of this train are still under construction.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Leo D. Geoffrion, Ph.D.
EMAIL: ldg@skidmore.edu
Webmaster - Strategic Communications
VOICE: 518.580.5735     FAX: 518.580.5748
Skidmore College    815 No. Broadway Saratoga Springs, NY 12866
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=