Why Format Choice Matters for Archival
Digital documents face a unique preservation challenge: unlike paper, which degrades slowly over centuries, digital files can become completely inaccessible within a decade if the software needed to open them disappears. Lotus 1-2-3 spreadsheets from 1990 are already difficult to open. Microsoft Works files from 2000 require specialized recovery tools. WordPerfect documents from the 1980s need dedicated conversion.
Choosing the right format for archival means selecting formats that will remain readable in 10, 25, or 50+ years. The key criteria are: open specification (not dependent on a single company), multiple independent implementations (so if one tool dies, others remain), wide adoption (critical mass ensures community support), and self-contained files (no external dependencies).
This guide covers the best formats for long-term document archival in 2026, from the gold standard (PDF/A) to simple but indestructible options (plain text).
Archival Format Comparison
| Format | Type | ISO Standard | Independent Implementations | Self-Contained | Best For |
|---|---|---|---|---|---|
| PDF/A | Fixed layout | ISO 19005 | 50+ | Yes | Regulatory, legal, official records |
| ODF (.odt, .ods) | Editable | ISO 26300 | 5+ | Yes | Government, editable archives |
| Plain text (.txt) | Raw text | N/A | Infinite | Yes | Maximum longevity, no formatting |
| TIFF | Image | ISO 12639 | Hundreds | Yes | Scanned documents, photos |
| EPUB | Reflowable | ISO/IEC | 20+ | Yes | Books, publications |
| Markdown (.md) | Structured text | N/A | Hundreds | Yes | Technical docs, structured content |
| DOCX | Editable | ISO 29500 | 3-5 (partial) | Mostly | Business (if required) |
PDF/A: The Gold Standard for Archival
PDF/A (ISO 19005) is the international standard for long-term archival of electronic documents. It is a restricted subset of PDF that prohibits features that threaten long-term preservation.
What PDF/A Requires
- All fonts must be embedded (no system font dependencies)
- All color must be specified in a device-independent color space
- No encryption or DRM
- No external content references (no linked images or URLs that must resolve)
- XMP metadata must be included
- A logical structure tree is recommended (PDF/A-2a, PDF/A-3a)
PDF/A Conformance Levels
| Level | Standard | Key Feature |
|---|---|---|
| PDF/A-1a | ISO 19005-1 | Tagged PDF (accessible), based on PDF 1.4 |
| PDF/A-1b | ISO 19005-1 | Visual reproduction (basic), based on PDF 1.4 |
| PDF/A-2a | ISO 19005-2 | Tagged, based on PDF 1.7, JPEG2000 support |
| PDF/A-2b | ISO 19005-2 | Visual, based on PDF 1.7 |
| PDF/A-2u | ISO 19005-2 | Unicode text layer required |
| PDF/A-3a | ISO 19005-3 | Allows embedded files (any format) |
| PDF/A-3b | ISO 19005-3 | Allows embedded files (basic) |
| PDF/A-4 | ISO 19005-4 | Based on PDF 2.0, latest standard |
Recommendation: PDF/A-2b for most archival needs. It offers broad compatibility with modern PDF features while maintaining strict archival requirements. Use PDF/A-3 if you need to embed original source files (e.g., the XLSX behind a financial report).
Converting to PDF/A
# Using Ghostscript
gs -dPDFA=2 -dBATCH -dNOPAUSE -dNOOUTERSAVE \
-sColorConversionStrategy=UseDeviceIndependentColor \
-sDEVICE=pdfwrite \
-dPDFACompatibilityPolicy=1 \
-sOutputFile=output_pdfa.pdf input.pdf
# Using LibreOffice (from DOCX, ODT, etc.)
libreoffice --headless --convert-to "pdf:writer_pdf_Export:{'SelectPdfVersion':{'type':'long','value':'2'}}" input.docx
For more on PDF/A conversion, see our guide on how to convert PDF to PDF/A.
ODF: Editable Open Archives
ODF (Open Document Format, ISO 26300) is the standard for editable document archives. Unlike PDF/A (which captures a fixed visual representation), ODF preserves the document in a fully editable form — with styles, formulas, and structure intact.
Advantages for Archival
- ISO international standard with full public specification
- Multiple independent implementations (LibreOffice, Apache OpenOffice, Google Docs, Calligra, ONLYOFFICE)
- Government-mandated in many jurisdictions (EU, UK, several US states)
- ZIP-based format (standard structure, easy to inspect and repair)
- No proprietary features or DRM
ODF Format Family
.odt— Text documents.ods— Spreadsheets.odp— Presentations.odg— Drawings.odf— Mathematical formulas
Limitations
- Complex Excel VBA macros do not translate
- Some advanced formatting may not round-trip perfectly with Microsoft Office
- Fewer rendering differences between implementations than DOCX, but not zero
Plain Text: The Immortal Format
For maximum longevity, nothing beats plain text. A UTF-8 text file from today will be readable in 100 years with trivially simple tools. No software company needs to exist, no specification needs to be maintained — the format is so simple that it is essentially permanent.
When to Use Plain Text
- Log files, data records, configuration files
- Reference documentation (in Markdown for structure)
- Source code and technical specifications
- Any content where formatting is secondary to content
- Disaster recovery copies of critical documents
Markdown for Structured Text
If you need headings, lists, links, and basic formatting, Markdown (.md) is plain text with lightweight structure. It is readable as raw text and can be rendered into HTML, PDF, or EPUB with hundreds of tools.
# Chapter 1: Introduction
This is a paragraph with **bold** and _italic_ text.
## Section 1.1: Background
- Item one
- Item two
- Item three
TIFF: Archival Images and Scanned Documents
TIFF (Tagged Image File Format, ISO 12639) is the standard archival format for rasterized documents and photographs:
- Supports lossless LZW or ZIP compression
- Supports multi-page documents (ideal for scanned archives)
- Supports ICC color profiles for accurate color preservation
- Universally implemented across all image software
- 30+ year track record of stability
Scanning for Archival
| Content Type | Recommended Settings |
|---|---|
| Text documents | 300 DPI, 1-bit (B&W), CCITT Group 4 compression |
| Mixed text/photo | 300 DPI, 8-bit grayscale, LZW compression |
| Color photographs | 400-600 DPI, 24-bit color, LZW compression |
| Fine art / maps | 600+ DPI, 48-bit color, no compression |
Archival Strategy: The Multi-Format Approach
The safest archival strategy uses multiple formats:
- PDF/A-2b — The primary archival copy. Fixed visual representation.
- ODF or DOCX — The editable source. For documents that may need revision.
- Plain text / Markdown — The content extraction. For searchability and maximum longevity.
- TIFF — For scanned or image-based content.
This redundancy means that even if one format becomes difficult to access in the future, the content survives in other forms.
Formats to AVOID for Archival
| Format | Why Not |
|---|---|
| Regular PDF (not PDF/A) | May have external font/image dependencies |
| DOC (binary Word) | Proprietary binary format, declining tool support |
| XLS (binary Excel) | Proprietary binary, formulas may break |
| Pages / Numbers / Key | Apple-proprietary, no third-party implementation |
| Google Docs native | Cloud-dependent, no offline file |
| Password-protected anything | Key loss = permanent data loss |
Quality and Preservation Tips
Validate PDF/A compliance. Use tools like veraPDF (open source) to verify your PDF/A files meet the standard. A file renamed to .pdfa is not PDF/A unless it actually conforms to ISO 19005.
Embed everything. For any archival format, ensure all dependencies are embedded: fonts, images, color profiles. External references become broken links over time.
Use checksums. Generate SHA-256 checksums for archival files and store them separately. This allows you to verify file integrity decades later.
Test readability. Periodically open archival files with current software to verify they are still readable. Migration to newer formats may be needed every 10-15 years.
Store metadata. Record what the document is, when it was created, who authored it, and why it matters. Metadata is as important as the content for future users who lack the original context.
For more on document formats, see our PDF vs DOCX comparison and our PDF/A archival format guide.
Conclusion
For most archival needs, PDF/A-2b is the right choice — it is an ISO standard, self-contained, widely supported, and designed specifically for long-term preservation. For editable archives, ODF offers the best combination of openness and editability. For absolute maximum longevity, plain text is indestructible. The safest approach is to archive in multiple formats: PDF/A for the definitive visual record, ODF or DOCX for editability, and plain text for content that must survive no matter what.
Need to convert? Try our free Document Converter to convert between PDF, DOCX, ODS, and other formats — no registration required.



