Transformed Text: Data Integration for Text-Based and Word Documents
NARA's Textual and Word Processing Preservation Plan Secures Long-Term Access to Digital Records
The National Archives and Records Administration (NARA) has established a comprehensive approach to preserving digital textual and word processing records through their Digital Preservation Framework. This plan, known as the Textual and Word Processing Preservation Plan, focuses on ensuring the long-term preservation and accessibility of digital records that are vital to our history.
The Textual and Word Processing Preservation Plan identifies specific file formats that are sustainable for preservation, taking into account criteria such as openness, stability, adoption as standards, and interoperability. Common file formats typically included in this plan are:
- Open and standardized text file formats like TXT (plain text)
- Word processing formats such as DOC, DOCX (Microsoft Word), and possibly ODT (OpenDocument Text)
- Additional formats used for textual records that meet preservation criteria, including transparency, stability, and compatibility
The evaluation criteria for including file formats involve factors such as the degree of openness and availability of format specifications, formal adoption by libraries, archives, and memory institutions, stability and compatibility across environments, and minimal reliance on proprietary hardware or software.
The Digital Preservation Framework, provided by NARA, also includes the Digital Equipment Corp (DEC) WPS Plus file format (.dx with NARA Format ID NF00156), the Extensible Forms Description Language (XFDL) file format (.xfdl), and the American Standard Code for Information Interchange (ASCII) 7-bit Text (.txt with NARA Format ID NF00113) and 8-bit Text (.txt with NARA Format ID NF00114) file formats.
Moreover, NARA's Digital Preservation Framework as Linked Open Data includes elements similar to the version available on GitHub, and the file formats listed are categorized under Textual and Word Processing in the Digital Preservation Framework. Each file format's NARA Linked Open Data is accessible through unique URLs, such as the one for ASCII 7-bit Text (https://www.ourwebsite.gov/files/lod/dpframework/id/NF00113.ttl).
The Textual and Word Processing Preservation Plan serves as test criteria for tools and processes used in format transformations. Textual and word processing records are created using various tools such as minutes of meetings, organizational charts, diaries, calendars, correspondence, reports, briefing books, legal opinions, and directives.
It's worth noting that NARA provides its Linked Open Data in Resource Description Framework Terse RDF Triple Language (RDF Turtle) format, and the eXtensible Markup Language 1.0 and 1.1 file formats have the file extension .xml, while the eXtensible Markup Language Schema file format has the file extension .xsd.
If you require the precise list of formats, NARA’s official Digital Preservation Framework documentation or their website would be the direct authoritative source to consult.
Data-and-cloud-computing technology plays a crucial role in the implementation of NARA's Textual and Word Processing Preservation Plan, as cloud services are utilized for long-term storage and accessibility of digital records. The Textual and Word Processing Preservation Plan also relies on technology to evaluate file formats for sustainable preservation, considering factors such as openness, stability, and interoperability.