We’d like to introduce you to the National Archives’ online catalog API, a major feature of the revamped catalog. If you are not already familiar with the magic of APIs (or “application programming interfaces”), you can think of it this way. Underlying both the API and the browser-based catalog is the dataset of archival descriptions, authority records, web pages, and other information. And just as the web site you see in your browser is the interface which allows you, a human, to interact with and search our dataset, the API is the interface by which computer programs can interact with the dataset—by following documented methods to retrieve or alter the structured data in the system.
The dataset for our catalog API contains all archival descriptions, authority records, digitized records (the images, videos, and so on) and their file metadata, all NARA web pages, and public contributions (tags, transcriptions, and comments). The API will allow developers to retrieve all of this metadata in specified formats (JSON or XML) for any given record or search results set. This means it is much more flexible than the advanced search or refinement options in the user interface, since the API can search using keywords or any field in the system, filter based on type of record, search within ranges, apply sorts, specify only particular fields to return, or any combination of these options. You can also generate a bulk export of your search results (including digital media), just like you can do in the catalog. The API is also writable, which means you can use it to post tags, transcriptions, or comments to records. We believe it is one of the first public write APIs in operation at a cultural institution. In order to support these functions, there are also methods for user registration and login—though accounts are the same in the UI and API. We just rolled out in-catalog transcription last year and comments this year, and we think building it into the API from the beginning has the potential to take it to a whole new level.
In addition to being read–write, the API is open source and follows the principles of REST. In designing our API, we were strongly influenced by the Digital Public Library of America’s API philosophy, especially their principle of a “presumption of openness”. Following this approach, we designed a system not for any particular use case, but one that is as open as possible to accommodate the creativity of the public. No API key or account at all is required to do basic searching. All original API source code has been released under the Creative Commons Public Domain Dedication (CC0), and you can find the it in our GitHub account. And, of course, all of our metadata and most of our digitized records are in the public domain, as works of the U.S. federal government, and can be freely reused and remixed without permission for any purpose.
We think this is a big deal. NARA’s recently revised mission statement affirms our commitment to “drive openness, cultivate public participation, and strengthen our nation’s democracy through public access to high-value government records.” Our mission is bigger than just our research rooms and web sites. In a recent essay, museum theorist Ed Rodley writes that the “spread of digital assets is a key factor in delivering on museums’ missions to educate, inform, stimulate, and enrich the lives of the people of the planet we live on.” We believe that our API will become a major way in which users are able to access our records, because the fundamental purpose of open data is to make our data sharable and reusable in many contexts outside of NARA itself. For example, in 2013, OCLC noted that 98% of the usage of their Virtual International Authority File comes via its API. This means they are succeeding in making their data useful to the public where people already go on the web, undertaking projects like linking hundreds of thousands of VIAF identifiers from the Wikipedia articles for their subjects. We think there are several ways we might make use of the API ourselves, like creating programs to gamify transcription of our records, uploading all of our data and digital assets to Wikimedia Commons or Wikidata, or setting up automatically curated social media feeds with our content. However, what excites us most is the potential for creative and unexpected uses of our API by the public, for any purpose.
Our API is still relatively new. We have documented several known issues which are still being worked out. But we encourage you to give it a try and see what you can create with it. The API is located at https://catalog.archives.gov/api/v1/, but we also recommend you start out by reading some of our documentation pages on GitHub, or playing in our interactive documentation feature to learn the ropes. And, also, be sure to give us feedback (whether questions, bug reports, or ideas for improvement) either in a comment below, in our GitHub repo’s issue tracker, or by emailing email@example.com. Let us know what you make!