Skip to content

AWS registers open data from the National Archives Catalog

NARA, in conjunction with AWS, publicly releases the National Archives Catalog dataset on the AWS Registry of Open Data. This guide assists users in navigating to access theavailable data.

AWS-Registered Data Catalogue from National Archives Open Data Index
AWS-Registered Data Catalogue from National Archives Open Data Index

AWS registers open data from the National Archives Catalog

The National Archives and Records Administration (NARA) has made a significant portion of its archival descriptions and authority records available to the public through the AWS Registry of Open Data. This dataset, known as the National Archives Catalog, totals over 261 gigabytes of data and is organized according to records groups and collections.

Users can access this comprehensive dataset using the AWS Command Line Interface (CLI). To pull the full dataset, simply use the command . Here, can be replaced with either or , depending on whether you're interested in descriptions or authority records.

The dataset is structured in a way that each record group or collection directory contains a sequence of JSON files, each representing data for up to 10,000 descriptions or authority records. For instance, the files in record group or collection directories follow the pattern .

If you're interested in specific collections, record groups, or descriptions, you can use more specific commands. To pull descriptions for a specific collection, use . Similarly, to pull descriptions for a specific record group, use . For authority records, the commands are analogous, with the directory replacing the directory in the commands above.

The parent/child relationship of series to file units/items is conveyed for each record through the parentSeries, parentFileUnit, etc. elements within the JSON. This structure provides a clear and organized way to navigate the vast amount of data in the National Archives Catalog.

In addition to the JSON files, the dataset contains URLs for over 148 million digital objects and data from citizen archivist contributions. Users can download the dataset as zip files from specific locations, or they can use the AWS CLI commands to list the full dataset.

Lastly, the dataset can be accessed with a specific ARN, providing a unique identifier for the dataset. This makes it easy to reference and share the National Archives Catalog dataset with others.

Read also:

Latest