What is the proper boundary between public and private data? How far should archivists go when collecting what might be private data? These questions introduced this session. The first presentation was a discussion of archival applications of digital forensics tools and techniques by Kam Woods, a Postdoctoral Research Associate at the School of Information and Library Science, University of North Carolina at Chapel Hill. His research focuses on developing techniques and tools to assist in long-term archiving and educational support for digital forensics datasets.
Here are the differences between digital forensics and archiving.
The main thing with archiving is to know what you have been given. Archivists are increasingly finding themselves dealing with streams of heterogeneous data. It is important to reduce risk in the acquisition process and maintain the integrity of the data. Private and sensitive data must be appropriately protected, and the authenticity and chain of custody (patterns of use and activity) of the data become important.
Advanced forensic formats include raw streams of data, cryptographic hashing, and metadata. Woods is working on a bulk extractor which will process data and produce Dublin Core metadata and digital forensic XML (DFXML). The open source code and APIs of this and other related programs are available here.
The Personal in the Organizational
What happens to personal data that is embedded in company records when the company fails? Sam Meister, Digital Archivist and Consultant to the Sherwood Project at the University of Maryland has looked at this question.
The Sherwood Archive Project, run in cooperation with Sherwood Partners in Mountain View, CA offers a private alternative to public bankruptcy. It attempts to save the records of business by investigating the potential to preserve the “abandoned” records of failed companies. When a company goes into bankrupcy, it assigns its data to Sherwood, which takes over ownership and trys to sell the intellectual property. Personal information on employees, suppliers, and users is often found in the records of the company, which raises issues on the disposition of the data. Records of startups tend to be particularly messy and difficult to deal with.
There is often not much regulation of this data, so disposition becomes an ethical issue. Codes of Ethics are available from the Society of American Archivists and the International Council on Archives. It is necessary to establish a relationship of trust between the original donor (which is not Sherwood) and the archive, and it is difficult to know what is in the records. If companies did not collect employee records, that would eliminate many concerns, but that would damage links between those records and others in the collection because personal identifiers are often major way records are linked together.
Technological solutions to these problems are available, but we must be sure that all the personal information is removed. One possibility is an initial period of restricted access until the data has been examined. In all these issues, there is a private to public transition, with a need to establish trust between private collections and repositories.
Columnist, Information Today and Conference Circuit Blog Editor