Our NCAST Advanced Research Partners at the National Center for Supercomputing Applications (NCSA) have been conducting research related to the Data Format Description Language (DFDL). DFDL is an XML-based language for describing the format, structure, and metadata of a file in such a way that the content of a file can be viewed without using the creating software or an existing viewer. This research may be useful to NARA in addressing the problem of providing access to electronic records that are stored in thousands of different formats as those formats become obsolete over time. If information about file formats can be recorded in DFDL, then in the future DFDL tools can be used to read the contents of those files and display their content in a meaningful fashion.
Here is an excerpt from their announcement:
“The records we preserve today need to be accessible and displayable by future technology. Beyond maintaining the accessibility of the raw bits of the digital data, preservation requires maintaining an ability to interpret the data as meaningful structures, relationships, and visual representations.”
“We are contributing to the development of a preservation system that would dramatically lower the per-file-format effort required for preservation.”
The researchers are currently developing the second generation DFDL parser, daffodil. They plan to release it as open source in the near future. You can find a report on the development of daffodil here.
DFDL has received a great deal of attention beyond the archival community. For example, this research was cited as a priority in the White House’s Draft NITRD Strategic Plan. (pg.17)
DFDL has been noted by the cloud computing community as well. At the November 2010 Cloud Computing Forum and Workshop II, Alan Sill, Vice President of Standards of the Open Grid Forum discussed DFDL developments and their contribution to cloud computing.