The new read–write API for our catalog

We’d like to introduce you to the National Archives’ online catalog API, a major feature of the revamped catalog. If you are not already familiar with the magic of APIs (or “application programming interfaces”), you can think of it this way. Underlying both the API and the browser-based catalog is the dataset of archival descriptions, authority records, web pages, and other information. And just as the web site you see in your browser is the interface which allows you, a human, to interact with and search our dataset, the API is the interface by which computer programs can interact with the dataset—by following documented methods to retrieve or alter the structured data in the system.

The dataset for our catalog API contains all archival descriptions, authority records, digitized records (the images, videos, and so on) and their file metadata, all NARA web pages, and public contributions (tags, transcriptions, and comments). The API will allow developers to retrieve all of this metadata in specified formats (JSON or XML) for any given record or search results set. This means it is much more flexible than the advanced search or refinement options in the user interface, since the API can search using keywords or any field in the system, filter based on type of record, search within ranges, apply sorts, specify only particular fields to return, or any combination of these options. You can also generate a bulk export of your search results (including digital media), just like you can do in the catalog. The API is also writable, which means you can use it to post tags, transcriptions, or comments to records. We believe it is one of the first public write APIs in operation at a cultural institution. In order to support these functions, there are also methods for user registration and login—though accounts are the same in the UI and API. We just rolled out in-catalog transcription last year and comments this year, and we think building it into the API from the beginning has the potential to take it to a whole new level.

National Archives API sample

This is what our catalog records look like as structured data! (Formatted by JSONView.)

In addition to being read–write, the API is open source and follows the principles of REST. In designing our API, we were strongly influenced by the Digital Public Library of America’s API philosophy, especially their principle of a “presumption of openness”. Following this approach, we designed a system not for any particular use case, but one that is as open as possible to accommodate the creativity of the public. No API key or account at all is required to do basic searching. All original API source code has been released under the Creative Commons Public Domain Dedication (CC0), and you can find the it in our GitHub account. And, of course, all of our metadata and most of our digitized records are in the public domain, as works of the U.S. federal government, and can be freely reused and remixed without permission for any purpose.

We think this is a big deal. NARA’s recently revised mission statement affirms our commitment to “drive openness, cultivate public participation, and strengthen our nation’s democracy through public access to high-value government records.” Our mission is bigger than just our research rooms and web sites. In a recent essay, museum theorist Ed Rodley writes that the “spread of digital assets is a key factor in delivering on museums’ missions to educate, inform, stimulate, and enrich the lives of the people of the planet we live on.” We believe that our API will become a major way in which users are able to access our records, because the fundamental purpose of open data is to make our data sharable and reusable in many contexts outside of NARA itself. For example, in 2013, OCLC noted that 98% of the usage of their Virtual International Authority File comes via its API. This means they are succeeding in making their data useful to the public where people already go on the web, undertaking projects like linking hundreds of thousands of VIAF identifiers from the Wikipedia articles for their subjects. We think there are several ways we might make use of the API ourselves, like creating programs to gamify transcription of our records, uploading all of our data and digital assets to Wikimedia Commons or Wikidata, or setting up automatically curated social media feeds with our content. However, what excites us most is the potential for creative and unexpected uses of our API by the public, for any purpose.

Our API is still relatively new. We have documented several known issues which are still being worked out. But we encourage you to give it a try and see what you can create with it. The API is located at https://catalog.archives.gov/api/v1/, but we also recommend you start out by reading some of our documentation pages on GitHub, or playing in our interactive documentation feature to learn the ropes. And, also, be sure to give us feedback (whether questions, bug reports, or ideas for improvement) either in a comment below, in our GitHub repo’s issue tracker, or by emailing api@nara.gov. Let us know what you make!

About dominic

I work in the Office of Innovation at the National Archives and Records Administration on digital engagement, including Wikipedia programs and open data.
This entry was posted in Catalog, Databases, Open Government, Social Media (Web 2.0), Wikipedian in Residence and tagged , , , , , , , , . Bookmark the permalink.

4 Responses to The new read–write API for our catalog

  1. Peggy Reeves says:

    I am not interested in computer jargon, I am interested in how to find records. Can you give us some examples? In the new system, how do we find records that are already scanned and on NARA’s website? Say, for example, the Civil War Widows’ pensions (RG-15)? Can you walk me through it?

    Another example–what if I want to search the finding aids to see data set descriptions for Record Group 94? Can I do that online?

    Thanks for your time.

    Like

    • dominic says:

      Hi Peggy,

      The API doesn’t generally provide access to different data than the regular catalog (https://catalog.archives.gov); it’s just provides the data in a different structure, without an interface, so that it can be read by computer programs.

      For your specific questions, if you wanted to find the descriptions for the Civil War Widows’ pension files series in the API, you could first find the identifier (NAID), and then use that to construct a query URL such as . Similarly, for RG 94 you could construct a query using that record group’s NAID. If you are curious about how to construct these queries, I encourage you to check out the links to the GitHub documentation and interactive documentation above.

      As you can see, though, explaining APIs inherently involves “computer jargon,” since it is a technical topic. If that’s not what you are interested in, and you simply need help with navigating the catalog or answering your research question, please feel free to direct your question for assistance.

      Thanks!

      Like

  2. ResearchBuzz says:

    Wow! Impressed with a read API, but a read/write API… !!! One question: is there some mechanism by which you’ll do a regular review of data written by the API so comments, tags, and transcripts don’t become littered with spam or errors? Thanks!

    Like

    • dominic says:

      Once the data is written via API, it becomes identical in the system to contributions via the UI. Users even use the same account and credentials to log in for contributing in either the UI and API. Contributions via the API are moderated by NARA staff for spam or abuse just like all transcriptions, comments, and tags. Generally, staff aren’t fact-checking, though, since public contributions are considered to be works of our online community (citizen archivists), and NARA does not attempt to guarantee their authoritativeness.

      Like

Comments are closed.