Tech Tuesday: Applied Research puts NARA "Out in Front" at NAGARA

At the National Association of Government Archivists and Records Administrators (NAGARA) plenary address in Nashville a few weeks ago, I was asked to talk about NARA’s new Applied Research Division, which wandered into an explanation about why we haven’t been ERA Research for the past two years. Folks were encouraged to attend my 1940 Census session, featuring NARA research partners who are using cool smart tools to make sense out of scanned images—there was not an empty chair in the room, leading to fruitful discussions and promising collaborations…and that’s what you missed at NAGARA!

Here’s the full story.

The National Association of Government Archives and Records Administrators (NAGARA) met in Nashville, TN on July 13-16. One of the best attended technology sessions – in my humble and unbiased opinion – was Friday’s session that I chaired, “The Way We Were: The 1940 Census,” which featured NARA archivist, Ms. Constance “Connie” Potter, and two NARA Applied Research partners, Dr. Kenton McHenry from University of Illinois at Urbana-Champaign (UIUC), and Dr. Richard Marciano from University of North Carolina (UNC), Chapel Hill (who also happened to present on the UNC DCAPE project the day before).

Session C-14 Panelist, Constance Potter — Connie Potter, NARA expert helps researchers access population census data.

Connie led the session, as NARA’s expert who helps researchers access the census records. She explained how the traditional processes for searching through the past censuses changed with 1940, along with new questions and supplemental schedule.

Connie then pointed us to some information posted recently on the NARA web (link provided at the end of this post), to help folks prepare for the 1940 Census release next year on April 2. For example, you can read through the manual given to the census takers giving detailed instructions about the questions and properly filling out the forms.

Dr. Kenton McHenry, our second speaker, gave a lively presentation about the project he and his team are working on at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. His presentation focused on the possibilities for free and searchable access of census records.

NAGARA Session C-14 Panelist, Dr. Kenton McKenry, NCSA / UnivIllinois, Urbana Champaign — Dr. Kenton McHenry's presentation, "Towards Free and Searchable Access of the 1940 Census Data"

When the census forms are scanned, they are exactly that: images that do not provide any descriptive information such as who/what is in this image; you get a snapshot of a columns filled with questions and hand-written responses, but there is no way to search through the names or addresses — or other related info collected on the form, for that matter — on tens of thousands of scanned images.

After each decennial census is conducted (72 years has to pass for its legal release), the forms are scanned and made available by NARA. Once the images are released, commercial genealogy companies hire people to input names, addresses, and other data handwritten on the forms – a process which could take several months – then they make the information available through their searchable database on their website for a fee.

Take this illustration here on the left, as an example. It could take 4-7 months for people to read through text and interpret or recognize the information on the form (such as the word “Daughter,” in this example), then “fat-finger” – or manually type – words into a database.

But to be sure the word/meaning is accurate another person would have to confirm that the word indeed reads “Daughter” and is correctly spelled. Imagine the number of people who would have to intervene if someone had really bad handwriting!

Dr. McHenry’s team uses advanced technologies that enable computers to recognize parts of a scanned form – such as columns (last name, first name, address) or combinations of image patterns (such as handwritten words or numbers) to assign meaning with accuracy. They use open source tools, which ultimately means that access to the released data would be quickly available at no charge.

The third speaker was Dr. Richard Marciano from UNC’s Sustainable Archives & Leveraging Technologies Lab, also called SALT. He described a project where his team uses census and other data to create mashups that allow a researcher to visually explore and combine the data in ways never imagined before. For example, you could examine a map showing the geographic distribution of people of different races in a city, and overlay that with a map showing areas of discrimination.

Last September, for example, Richard presented at NARA’s 1940 Census Workshop, the SALT team’s “T-RACES” project. T-RACES provided an analysis of redlining practices conducted between 1932 and 1964 by home loan finance companies (including the FHA) using census data to exclude non-white families from receiving housing loans.

Click on the picture below to watch a video of the September presentation – but it’s about one hour long, so finish reading this blog post, then watch the video!

NAGARA C14 panelist, Richard Marciano, UNC/SALT — Richard Marciano's research team find tools that enable various ways to explore and combine census data

We then facilitated a discussion with the session attendees to show how NARA’s research efforts have and continue to address the challenges and needs of archives and records management communities like NAGARA. It was clear that the audience – including the genealogy service companies in attendance, was interested in the possibilities for improved access using technologies such as those presented by Kenton and Richard. In addition, participation in social media — such as volunteer crowd sourcing collaborations — can ensure that future censuses will be quickly and easily accessible at no charge to the public.

Here are some links related to this post:

To prepare for the release of the 1940 Census next year, visit NARA’s web site for the 1940 Census.
Follow all our blog posts on the 1940 Census!
Learn about the National Association of Government Archives and Records Administrators NAGARA
The 1940 Census Workshop, hosted by NARA on September 13, 2010
Just posted! Pictures from the NAGARA Session C-14
Join the Applied Research Facebook Group

We’d love to hear your feedback! Please leave your comments and questions below, or send an email to us at appliedresearch@nara.gov

4 thoughts on “Tech Tuesday: Applied Research puts NARA "Out in Front" at NAGARA”

Mark Conrad says:

July 28, 2011 at 12:12 pm

Great post, Rita! Really wish I had been there.

BTW, You mentioned the DCAPE project in your post. Here is the URL for that project:
http://salt.unc.edu/dcape/

Here is the URL for the T-RACES project, too:
http://salt.unc.edu/T-RACES/

Mark

Meredith Stewart says:

July 28, 2011 at 3:10 pm

Hi Rita,

I really appreciate your blog post – since it brings the presentations to those of us who couldn’t make it.

I’m curious about Dr. McHenry’s research. I think using a technological solution to recognize key parts of the Census is a great idea — but the names of people are the most important aspect of the census – what researchers want and their portal into the census. Does the technological solution that he’s worked on try to get at the people names?

Also, I don’t think we should underestimate the people who “key” records on an Ancestry site or Footnote, etc. These folks are passionate about looking at the document, spending the time, and getting the job done. There’s a vibrant volunteer community around this work that keys thousands of records each day,. making them accessible. Yes, it’s a lot of work, but we shouldn’t disregard them as a community. Their work is very important and developing tools that help us crowdsource this work would be an important avenue of development too.

Rita Cacas says:

August 10, 2011 at 3:11 pm

Meredith,
Great comment and question!

I completely agree that the work of the volunteers in creating the name index is admirable and heartfelt! Indeed, NARA’s transformation is presenting to us innovative ways to work together. As I highlighted during the NAGARA plenary, it is very exciting that NARA over the last year has introduced social media to engage new and diverse communities.The use of crowdsourcing volunteers and events allows these communities to interact and add value to future access to the records. And through these types of collaborations, we have opportunities to connect technology and people.

Dr. McHenry’s research with his team at NCSA focuses on taking the digitized images of the 1930 census and creating a machine-readable file that will enable people to search on all of the fields, not just the name field. In addition, their work includes forthcoming opportunities for crowdsourcers to contribute to their continued research. The NCSA research report on the results of this work is due to NARA this winter, so stay tuned!

Thanks again for your comment!
Rita

Rita says:

September 23, 2011 at 1:38 pm

This just in! Read this new article by NARA Research partner, Kenton McHenry (and his team), “Toward free and searchable historical census images” – which provides additional details about their research and methods. The article is presented by The Society of Photographic Instrumentation Engineers (SPIE) a pre-eminent international society for Technology research & Engineering in Optics and Photonics.

http://spie.org/x57241.xml?highlight=x2410&ArticleID=x57241

Share this:

4 thoughts on “Tech Tuesday: Applied Research puts NARA "Out in Front" at NAGARA”

Leave a Reply Cancel reply