AWS registers open data from the National Archives Catalog
The National Archives and Records Administration (NARA) has made a significant portion of its archival descriptions and authority records available to the public through the AWS Registry of Open Data. This dataset, known as the National Archives Catalog, totals over 261 gigabytes of data and is organized according to records groups and collections.
Users can access this comprehensive dataset using the AWS Command Line Interface (CLI). To pull the full dataset, simply use the command . Here, can be replaced with either or , depending on whether you're interested in descriptions or authority records.
The dataset is structured in a way that each record group or collection directory contains a sequence of JSON files, each representing data for up to 10,000 descriptions or authority records. For instance, the files in record group or collection directories follow the pattern .
If you're interested in specific collections, record groups, or descriptions, you can use more specific commands. To pull descriptions for a specific collection, use . Similarly, to pull descriptions for a specific record group, use . For authority records, the commands are analogous, with the directory replacing the directory in the commands above.
The parent/child relationship of series to file units/items is conveyed for each record through the parentSeries, parentFileUnit, etc. elements within the JSON. This structure provides a clear and organized way to navigate the vast amount of data in the National Archives Catalog.
In addition to the JSON files, the dataset contains URLs for over 148 million digital objects and data from citizen archivist contributions. Users can download the dataset as zip files from specific locations, or they can use the AWS CLI commands to list the full dataset.
Lastly, the dataset can be accessed with a specific ARN, providing a unique identifier for the dataset. This makes it easy to reference and share the National Archives Catalog dataset with others.
Read also:
- Grid Risk Evaluation Strategy By NERC Outlined, Focusing on Potential Threats from Data Centers
- Rapid Expansion in Organic Rice Protein Market Projected at 15.6% Through 2034
- The Virtual Commissioning Market is projected to exceed $4.86 billion by the year 2034.
- Kenya broadens economic zones featuring Olkaria's geothermal energy advantage