An Introductory Look At Personal Digital Archiving


Headquarters of the Internet Archive in San Francisco, venue for the PDA Conference



The Conference Drew About 150 Attendees

The Personal Digital Archiving (PDA) conference was held in San Francisco on February 24-25, 2011 at the headquarters of the Internet Archive–a highly appropriate venue.  (The building was formerly a church and has many interesting architectural features.)  Before I attended the conference, I thought that PDA mainly dealt with digitizing old family photos and documents and wondered how it could take 2 days to cover that subject.  While preservation of family records is included in PDA, I quickly learned that the subject is much broader and that 2 days can only touch on some of its aspects.  One measure of the success of the conference is the fact that about 150 people attended, twice as many as at last year’s conference.

Cathy Marshall

Cathy Marshall from Microsoft Research began with a somewhat apocryphal story that illustrates a widespread lackadaisical attitude towards archiving of data.  Her laptop got hot and started making noises, but she did nothing about it for 6 months.  By that time, it needed a complete overhaul. While she was able to recover all of her files, her tweets had disappeared (fortunately an urgent appeal to Twitter was successful and she recovered them).  And even though she has documented at least 3 ways to back up tweets, she still has not implemented any of them!  Backup programs for data on PCs are widely available, but they are seriously underused, and a Pew Internet study found that about 2/3 of Americans store their data in the cloud.

Here is a brief history of personal archiving, based on a field study in 3 cities:

  • 2005:  benign neglect and its side effects. Typical behavior when acquiring a new computer is to simply copy everything from the old one to it.  Sometimes a few files are deleted, but many more are “put in various miscellaneous nooks and crannies, never to be seen again”.  People tend to put copies of their data in different places for various reasons, but data safety is not one of them.  It is easier to keep data than to cull it, but it is also easier to lose it than to maintain it.   Many people are ambivalent about the value of their data and consider it just nice to have around.
  • 2007:  personal data has a life of its own.A British Library study found that nearly 70% of reported data loss was because of the inability to find the data; only 8% was because of hard drive failure.   Copies of data take on lives of their own.
  • 2010:  social media have raised complex issues. Ownership issues are a slippery slope and may extend beyond literal boundaries.  Other issues to consider include reuse, reciprocity, and limits of use.

What should we make of all this?

  • Someone else should be doing the archiving.
  • We won’t know why we have saved all those pictures after a couple of decades have passed.
  • Benign neglect becomes online neglect.
  • Digital information will survive only as long as someone takes care of it.

Preserving Family Digital Records

Gary Wright

Marshall was followed by Gary Wright, formerly with FamilySearch and now Product Manager of digital history for the LDS church.  He described a white paper he wrote on preserving family records that was promoted by Eastman’s Online Genealogy Newsletter as “definitive”.  He noted that digital preservation is not just backing up your data (as I had thought), but it is a process of storing digital records for a very long time in multiple locations at the highest affordable resolution, migrating the data periodically to new media to prevent loss or the inability to read it, changing formats before they become obsolete, thus ensuring that your posterity can always access their heritage.

Digital preservation is not a one-time event, nor is it a one-generation project.  It is important to prepare posterity to carry on our preservation work.  Wright recommended two new file formats for data storage:  PDF/A and JPEG 2000, both of which do not appear to degrade over long periods of time.  He also stressed the importance of good metadata and suggested that incorporating it into the filename is a useful way to save the information.  He also mentioned the M-Disc, the only archival disk claiming to be stable for 100 years or more and used by the Legacydox personal archiving service for data preservation.

The Digital Beyond

Evan Carroll

What happens to digital information after its owner or creator dies?  Evan Carroll is the co-author of a blog on this subject and has also written a book entitled Your Digital Afterlife.  Among the issues to be considered are:

  • Awareness.  Do the heirs know that the archives exist?  Do they have access to the password?
  • Meaning.  Archives are passed on as an act of preservation.  How can they be imbued with meaning?
  • Value.  Value is relative and changes from one person to another and one time to another.  It is extracted from more than just the content; sometimes the story behind the object is necessary to appreciate it.

Some times heirs to a digital collection are not sophisticated technology users, so archives must be designed to ensure awareness and access.  The wishes of the creator must be respected as well; why was an object considered important?  Metadata can be useful in determining this.  Can we attach comments and conversations to our digital objects?

These were some of the opening issues that were highlighted at the conference.  I have divided this report into several posts so that none of them become too long.  The next one will summarize PDA activities in other areas.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


Comments are closed.