Personal Archiving in the Academic World
Ellysa Cahoy and Scott McDonald from the Pennsylvania State University library discussed some of the issues surrounding personal archiving in the academic world. Libraries are becoming increasingly personal; the physical library is drawing away, and the library on the scholar’s desktop is growing. Mediation is disappearing, and we are becoming our own librarians. Librarians in physical libraries must start thinking about users’ libraries on their computers and develop services to help them curate their materials. Finding and citing articles are the most frequent activities of academic users. Archiving must be embedded in these processes; otherwise, backups will not occur. The slide below shows some of the research tools used by faculty members:
Our content containers consist of many small pieces loosely joined; for example, Scott said he has 2,500 PDF documents on his computer. It is important to remember where you put that brilliant idea that you had. This is a new world for faculty, in which all research data arrives in digital format, and bookshelves are virtually empty.
Some libraries are working with faculty to help them curate their files, including social web data–tweets, blog posts, etc. are an increasingly important part of scholarly work and must be drawn into the pipeline. Faculty and graduate students must be taught to become their own archivists. Libraries have an opportunity to fit into this process, and at Penn State, the Krause Innovation Studio is helping faculty to think about this problem.
Applying Product Management Techniques to Individuals’ Archiving
Judith Zissman, and independent consultant, said that applying product management techniques to archiving can be very useful. The Internet is becoming our archive; what does “good enough” archiving look like? In the software area, programmers have developed an “agile manifesto” and wrote down what they considered important, which has revolutionized software development. We must decide what is important because storing everything devalues what is precious. Processes are not perfect, and it is important to recognize that individual users have different goals than organizations. The best practices from 10 years ago, or even a year ago, are not the best practices of today.
DIY Personal Archives
Stan James, founder and CTO of Lijit Networks, worked with his father on an extensive project to digitize his family’s archives. He used the following software products:
- Picasa, a “visual workhorse”
- Audacity for editing audio files
- Dragon NaturallySpeaking for capturing the captions of the photos (much faster than typing them)
- LiveMesh for synching photos from laptops and CDs
- Ancestry.com for researching genealogy
- LogMeIn.com for providing remote tech support to family members
- Mozy.com for backups.
This is an extremely useful list for anyone contemplating starting such a project (or anyone in the midst of one as I am!). It was interesting to note that he used Google street view to see what places from family history look like now (in some cases, they are radically changed!). He had problems with Picasa’s face tagging feature because it is tied to Gmail contacts, and there is no way to use it for ancestors. He also found that there is no way to get audio files out of ancestry.com.
What Do We Mean By “Personal”?
People create documents with an understanding of social norms and how things should be done at reunions, etc. Creation of documents is usually done in a family context, and multiple people will have an interest in a photo of which only a single copy exists. This raises questions of ownership, custodianship, etc. Identities are not things we possess, but processes in motion, which has implications for research, tool building and services, archives, and archivists.
Our Rapidly Disappearing Digital Heritage
Jason Scott is an activist. He is a collector of home computer materials from the 1970s and 1980s, and his first urge was to share it with as many people as possible, which became a website. Now, people come to him for computer history, and he has formed the Archive Team to preserve it and rescue it from imminent deletion. For example, AOL recently gave only 2 months notice of closing a site that had existed since 1988, and many people lost their personal data. And Yahoo closed GeoCities which was the first experience using web pages for many people. However in the GeoCities case, the site was copied and replicated (all 900 gigabytes of it!) and it is now available again via BitTorrent.
The Archive Team takes, duplicates, and saves large datasets. They are currently updating Yahoo Video–a 20 terabyte collection–and are also scraping the Delicious site in anticipation of its possible closing. Efforts such as this recognize that the context of history shifts. When using these popular services, people are in an environment where they have no rights, and sites can be deleted with only very short notice. Scott suggested that regular archiving of one’s My Documents folder is an excellent practice and can provide recovery from many disasters.
My previous post and this one complete the summary of the Thursday morning sessions. Watch for further installments.
Columnist, Information Today and Conference Circuit Blog Editor