User Behavior and Archival Practices

Three speakers discussed how people archive their files.  Devin Baker, Digital Librarian, University of Idaho, and Collier Nogues, University of California, Irvine, studied how writers manage their data.  Writers serve as focusing agents because so much of their work exists only digitally and is very valuable to them.  Many of them simply save each succeeding version of a file over the previous one, which raises the question, “Is CTRL-S poor archiving practice?”  The study by Baker and Nogues revealed that about half of the respondents primarily save over their files periodically, but only 21% do it all the time.  They have little sense of “file management” and save files in a wide variety of media–thumb drives, laptops, desktop PCs, and in e-mail.  Writers are good at backing up files, but the plethora of backups leads to problems with file management; 31% of writers said that they do not keep track of different versions of files saved in more than one location.  Not only that, but unconventional naming conventions are prevalent.  About half of the authors said they use e-mail as an archiving method; a major issue therefore is how will we archive writers’ correspondence?  Can we recover some of what has been lost already?

Hong Zhang

Hong Zhang, a doctoral student at the University of Illinois has also been studying file naming and archiving practices.  File folders are workspaces; old files implicitly become archives.  Problems arise when people forget where they put files and cannot find them again.  Often, one can determine the type of archive based on how the user has named the file.  For example, a file called “xxx-old” is generally for something that has been completed or is not expected to be referred to again in the near future.  Incorporating the date into a file name may indicate an implicit archive; for example, “2007 expense forms”.  This naming convention works for a while, but if the files are moved or the user reorganizes the computer, the system may break down.

Jason Zallinger, a graduate student at Rensselaer Polytechnic Institute, has been studying Gmail as an archiving method.  With large amounts of storage available to users, Gmail has become a “storyworld”.  Zallinger interviewed 6 users between ages 27-39 about their Gmail accounts.  He concluded that we are now all digital storytellers, historians, and autobiographers of our own lives and have become good at capturing digital data; however, we are not good at making sense of it all.  Thousands of clues to our life stories are sitting in our archives; how do we design systems for the desire to save information but not look at it?

Zallinger suggested that a “Forget” button would set a reminder on old e-mails and give the user a gentle reminder in several years to clean out what is not useful.  He also mentioned other interesting e-mail tools to help users.  Mail Goggles gives users some simple math questions to solve before the mail is sent, which may prevent e-mail users from sending messages they regret later.  Zallinger has also created a blog to document his experiences in creating an open source system to turn Gmail archives into a simple game and make them into a story.  He also has created a Wordle from his e-mails.

Visuals are powerful memory clues, and Cathal Gurrin and Aiden Doherty at Dublin City University have taken the collection of life stories to a whole new level by using wearable cameras (called SenseCams) that take about 3 pictures a minute without user interaction and capture everything they did in a day.  The cameras have sensors that trigger the captures, and have been augmented with GPS and Bluetooth devices to identify activities, personal interactions, e-mails, etc.  They do not record audio because they found that even though people do not seem to mind cameras, they will stop talking if they are being recorded.  Gurrin now has an archive covering 4.5 years that contains over 7 million photos.

Cathal Gurrin wearing his camera and holding a GPS device

A major need is to build a search engine to search this vast archive.  How can it all be organized?  One way is to designate important events and then search for them.  Another is to search automatically identified activities.  Gurrin’s research group was able to search for one event in the 30,000 stored over the past 2.5 years and retrieve it in about 2 minutes.  They have published about 40 articles about their research on visual lifelogging and their experience with the SenseCam; click here to access them.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor



Comments are closed.