What Is PDF/A and Why Does It Matter?
PDF/A is a subset of PDF standardized as ISO 19005, specifically designed for long-term archival of electronic documents. While a regular PDF can reference external fonts, include JavaScript, embed multimedia, use encryption, and link to external URLs, PDF/A strips all of this away. Every element needed to faithfully reproduce the document must be embedded within the file itself.
This matters for legal compliance (courts require archival-grade documents), regulatory requirements (financial records must be preserved for decades), government mandates (many agencies require PDF/A for official records), corporate archival policies, and library and museum digital preservation programs.
A regular PDF is like a house that depends on the neighborhood — it references external resources. A PDF/A file is a self-contained capsule — everything needed to render it is inside the file, with no external dependencies.
PDF/A Conformance Levels Explained
| Level | ISO Standard | PDF Base | Key Requirements |
|---|---|---|---|
| PDF/A-1b | ISO 19005-1 | PDF 1.4 | Visual reproduction guaranteed |
| PDF/A-1a | ISO 19005-1 | PDF 1.4 | Visual + tagged structure + Unicode |
| PDF/A-2b | ISO 19005-2 | PDF 1.7 | Visual, JPEG2000, transparency |
| PDF/A-2a | ISO 19005-2 | PDF 1.7 | Visual + tagged + Unicode |
| PDF/A-2u | ISO 19005-2 | PDF 1.7 | Visual + Unicode text mapping |
| PDF/A-3b | ISO 19005-3 | PDF 1.7 | Allows embedding any file type |
| PDF/A-3a | ISO 19005-3 | PDF 1.7 | Tagged + Unicode + embedded files |
| PDF/A-4 | ISO 19005-4 | PDF 2.0 | Latest, most flexible |
Which Level to Choose?
- PDF/A-2b — The most common choice. Broad compatibility with modern PDF features (transparency, JPEG2000 compression) while meeting core archival requirements. Most validation tools accept it.
- PDF/A-1b — When you need maximum backward compatibility with older archival systems.
- PDF/A-3b — When you need to embed original source files (e.g., the Word document or spreadsheet behind the PDF).
- PDF/A-2a or PDF/A-3a — When accessibility is required (tagged structure enables screen reader compatibility).
Method 1: Using Ghostscript
Ghostscript is the most widely used tool for PDF/A conversion:
gs -dPDFA=2 -dBATCH -dNOPAUSE -dNOOUTERSAVE \
-sColorConversionStrategy=UseDeviceIndependentColor \
-sDEVICE=pdfwrite \
-dPDFACompatibilityPolicy=1 \
-sOutputFile=output_pdfa.pdf input.pdf
Ghostscript Parameters Explained
-dPDFA=2— Target PDF/A-2b conformance (change to1for PDF/A-1b,3for PDF/A-3)-sColorConversionStrategy=UseDeviceIndependentColor— Converts all colors to device-independent color spaces (required by PDF/A)-dPDFACompatibilityPolicy=1— If a feature cannot be converted (e.g., transparency in PDF/A-1), convert it rather than failing-dNOOUTERSAVE— Prevents save errors with complex PDFs
With an Output Intent (Required for Some Validators)
Some strict validators require an output intent ICC profile:
gs -dPDFA=2 -dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=UseDeviceIndependentColor \
-dPDFACompatibilityPolicy=1 \
-sOutputICCProfile=sRGB.icc \
-sOutputFile=output_pdfa.pdf \
PDFA_def.ps input.pdf
The PDFA_def.ps file and sRGB.icc profile should be included with your Ghostscript installation.
Method 2: Using LibreOffice
LibreOffice can convert documents directly to PDF/A during export:
# From DOCX to PDF/A
libreoffice --headless --convert-to "pdf:writer_pdf_Export:{'SelectPdfVersion':{'type':'long','value':'2'}}" input.docx
The SelectPdfVersion values: 1 for PDF/A-1, 2 for PDF/A-2, 3 for PDF/A-3.
From Other Formats
# ODT to PDF/A
libreoffice --headless --convert-to "pdf:writer_pdf_Export:{'SelectPdfVersion':{'type':'long','value':'2'}}" input.odt
# XLSX to PDF/A
libreoffice --headless --convert-to "pdf:calc_pdf_Export:{'SelectPdfVersion':{'type':'long','value':'2'}}" input.xlsx
# PPTX to PDF/A
libreoffice --headless --convert-to "pdf:impress_pdf_Export:{'SelectPdfVersion':{'type':'long','value':'2'}}" input.pptx
Method 3: Using QPDF
QPDF can linearize and optimize existing PDF/A files but cannot perform the initial conversion. However, it is useful for post-processing Ghostscript output:
# Linearize PDF/A for fast web access
qpdf --linearize output_pdfa.pdf output_pdfa_linearized.pdf
Method 4: Online Conversion
Use the PDF tools online for quick PDF/A conversion without installing any software. Upload your PDF and select PDF/A as the output format.
Validating PDF/A Compliance
Creating a file that claims to be PDF/A is easy. Creating one that actually conforms to the standard requires validation.
Using veraPDF (Gold Standard)
veraPDF is the open-source reference validator endorsed by the PDF Association:
# Install veraPDF
# Download from verapdf.org
# Validate PDF/A-2b
verapdf --flavour 2b output_pdfa.pdf
veraPDF reports every conformance violation with specific clause references to ISO 19005.
Using JHOVE
jhove -m PDF-hul output_pdfa.pdf
Common Validation Failures
| Error | Cause | Fix |
|---|---|---|
| Font not embedded | PDF references a system font | Re-convert with font embedding enabled |
| Device-dependent color | Uses RGB/CMYK without profile | Add -sColorConversionStrategy=UseDeviceIndependentColor |
| Transparency present | PDF/A-1 does not allow transparency | Use PDF/A-2 or flatten transparency |
| JavaScript present | Scripts prohibited in PDF/A | Remove with qpdf --remove-page-labels or Ghostscript |
| Encryption present | DRM prohibited in PDF/A | Decrypt before converting |
| Missing XMP metadata | XMP is required by PDF/A | Ghostscript adds it automatically |
Batch Converting to PDF/A
All PDFs in a Directory
mkdir -p pdfa
for file in *.pdf; do
[ -f "$file" ] || continue
gs -dPDFA=2 -dBATCH -dNOPAUSE -dNOOUTERSAVE \
-sColorConversionStrategy=UseDeviceIndependentColor \
-sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 \
-sOutputFile="pdfa/${file}" "$file"
done
Validate After Conversion
for file in pdfa/*.pdf; do
echo "Validating: $file"
verapdf --flavour 2b "$file" | grep -E "(compliant|nonCompliant)"
done
Quality and Settings Tips
Font embedding is critical. The most common PDF/A validation failure is missing embedded fonts. If the source PDF uses system fonts (Arial, Times New Roman, Calibri), Ghostscript will attempt to embed them during conversion. If it cannot find the fonts on the system, the conversion may succeed but with font substitution — which changes the visual appearance.
Color profiles matter. PDF/A requires device-independent color specifications. If your source PDF uses raw CMYK or RGB without an ICC profile, the conversion must add one. Ghostscript's UseDeviceIndependentColor strategy handles this automatically.
Transparency flattening. PDF/A-1 does not support transparency (common in modern documents with drop shadows, semi-transparent elements). Either use PDF/A-2 (supports transparency) or let Ghostscript flatten it to opaque elements.
File size increase. PDF/A files are typically 10-30% larger than regular PDFs because fonts and color profiles are embedded. For documents with many fonts or large images, the increase can be more significant. This is the expected trade-off for self-contained archival.
Metadata completeness. Include meaningful XMP metadata (title, author, creation date, subject) in your PDF/A files. Future archivists will rely on this metadata to identify and categorize documents. Add metadata during conversion:
gs -dPDFA=2 -dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=UseDeviceIndependentColor \
-dPDFACompatibilityPolicy=1 \
-c "[ /Title (Quarterly Report 2026) /Author (Finance Dept) /DOCINFO pdfmark" \
-sOutputFile=output_pdfa.pdf input.pdf
For more on PDF formats, see our PDF vs PDF/A guide and our best formats for archiving documents guide.
Common Issues and Troubleshooting
"Cannot embed font" error
The source PDF references fonts not available on the conversion system. Install the required fonts, or use Ghostscript's font substitution. For missing Windows fonts on Linux:
# Install Microsoft core fonts
apt install ttf-mscorefonts-installer
Validation passes but document looks different
Font substitution occurred during conversion. The file is technically PDF/A compliant, but a substitute font replaced the original. Check for substitution warnings in the Ghostscript output.
Scanned PDF fails validation
Scanned PDFs (image-based) are typically not text-searchable and may lack required metadata. Run OCR first (see our OCR guide), then convert to PDF/A. The OCR process adds a text layer that satisfies PDF/A-2u requirements.
PDF/A-3 embedded files not recognized
When embedding files in PDF/A-3, the embedded file must be described with an AFRelationship entry. Ghostscript handles this automatically for simple cases, but complex embeddings may require a dedicated PDF/A library.
Conclusion
Converting to PDF/A is the standard approach for long-term document preservation. Use Ghostscript for the most reliable conversion with the broadest format support. Choose PDF/A-2b for general archival, PDF/A-3b when you need to embed original source files, and always validate with veraPDF after conversion. The process adds embedded fonts and color profiles, resulting in slightly larger but completely self-contained files that will remain readable for decades.
Ready to convert? Try our free PDF tools — no registration required.



