As part of the Open Government initiative, NARA recently released six datasets available for the first time as raw data in XML format. The datasets are:
- three editions (2007, 2008, and 2009) of the Code of Federal Regulations (CFR)
- archival descriptions from the Archival Research Catalog (ARC)
- organization descriptions from the Archival Research Catalog (ARC)
To learn more, see the press release.
Clever people have already started working with these data sets, as discussed in a post on Mark Matienzo’s blog, “thesecretmirror” (http://thesecretmirror.com/description/nara-and-data-dot-gov):
“…Obviously, transferring this much data is difficult, and I was quite shocked when I discovered that NARA didn’t bother to compress this data in the first place when I first decided to get my grubby paws on it. Not to be outdone, I corresponded with a few people over Twitter who were just as interested in the data, specifically Simon Spero at the UNC School of Information and Library Science, and Richard Urban, at UIUC’s Graduate School of Library and Information Science. The three of us made a concerted effort to grab the data from NARA’s web server and make a compressed version available.
After 6 hours of so of transferring the files and compressing them, Simon has posted the compressed dataset on ibiblio.org, as part of his Fred2.0 dataset project. Download the whole thing, decompress it, and start crunching – there’s so much you can do with it! Convert the series descriptions to EAD! Convert the organizational descriptions and histories to EAC! Throw Mitchell Whitelaw’s series browser on top of it! The future’s in your hands, people, and now the data is too.”
We are excited to hear that Mark Matienzo, Simon Spero and Richard Urban are eager to work with the data NARA has made available. NARA IT staff kept running into technical issues while working to compress the data, so we decided to go ahead and post it uncompressed rather than hold the data back. It’s great to see that Mark, Simon and Richard took the initiative to compress and share it.
We look forward to hearing more about what mashups and visualizations people are able to create based on the data. We hope they have fun working on it!
– Jill (Admin)
Hi Jill –
I’ve run into some issues with the ARC data sets; so far I’ve fixed some character encoding issues that let to a number of records not being valid XML.
There are some also some oracle generated errors that I haven’t fixed yet, but there’s more detail, and tentatively fixed tarballs on the Fred 2.0 blog.
See some of the posts under the ARC tag
archival-research-catalog tag.
Simon,
Thanks for bringing these issues to our attention. I will pass them along to the tech staff who help us export and post the data. Please stay tuned.
– Jill (Admin)
I have been looking for my military records from 1976-1988 for thelast 20 years. everytime I request them they cant seem to find certain medical and my records from enewetak atoll cleanup project can anyone help me with this cause I have the beginning stages of cancer
Hi Dieter,
When you have previously tried to obtain your military records, have you been contacting the National Personnel Records Center in St. Louis, Missouri? If not, you should definitely give them a try, they typically hold personnel service records, including medical information, for retired service members of all military branches. You can request your personnel file by filling out Standard Form 180 (available as a download in pdf format on NARA’s web site at http://www.archives.gov) or by submitting an online request through eVetrecs (an electronic service available to veterans and their next of kin on NARA’s site at http://www.archives.gov/veterans/evetrecs/).
– John