What Is PDF Redaction?
Redaction is the permanent, irreversible removal of sensitive information from a document. Unlike hiding text behind a black box (which can be removed), proper redaction deletes the underlying data entirely. Once redacted, the information cannot be recovered by any means -- not by removing the black box, not by copying the text, not by examining the file's internal code.
This distinction between permanent redaction and cosmetic hiding is critical. Many high-profile data breaches have occurred because someone drew a black rectangle over sensitive text in a PDF without actually deleting the underlying data. The text was still there, hidden behind the visual overlay, waiting for anyone who knew to select-all and paste into a text editor.

Proper redaction permanently removes:
- The visible text or image from the page
- The underlying text data from the PDF's content stream
- Any metadata associated with the redacted content
- The content from the PDF's internal structure (cross-reference table, object streams)
After proper redaction, the removed information is replaced with a black (or colored) rectangle or blank space that contains nothing underneath.
Why Redaction Matters
Legal Requirements
Laws like GDPR (Europe), CCPA (California), HIPAA (US healthcare), and FOIA (US government) require organizations to protect sensitive information when sharing documents:
- GDPR: Personal data must be anonymized or redacted before sharing with unauthorized parties. Fines up to 4% of global revenue for violations.
- HIPAA: Protected health information (PHI) must be redacted from medical records shared outside the treatment team. Fines up to $1.9 million per violation category per year.
- FOIA: Government agencies must redact classified and exempt information from public records releases.
- Court orders: Legal proceedings frequently require redaction of personally identifiable information (PII), trade secrets, and minors' names.
Professional Obligations
- Law firms redact privileged information, client identifiers, and case-sensitive details
- Financial institutions redact account numbers, Social Security numbers, and transaction details
- Healthcare providers redact patient names, diagnoses, and medical record numbers
- Real estate transactions require redaction of buyer/seller financial details in shared documents
- HR departments redact salary information, performance data, and personal details from shared reports
The Danger of Improper Redaction
Drawing a black rectangle over text in a PDF is not redaction. It is a cosmetic overlay that leaves the original text fully intact in the file's data. Here is why this fails:
How Hidden Text Is Exposed
- Select and copy: A user opens the PDF, selects all text (Ctrl+A), copies it (Ctrl+C), and pastes into a text editor. The "hidden" text appears in full.
- PDF editing tools: Any PDF editor can select and delete the black rectangle, revealing the text underneath.
- Text extraction: Command-line tools like
pdftotextextract all text from the PDF, including text under visual overlays. - Accessibility tools: Screen readers read all text in the content stream, regardless of visual overlays.
Real-World Failures
Improper redaction has caused significant data exposures:
- A US government agency released documents with social security numbers "redacted" by black highlighting -- all numbers were recoverable
- A law firm submitted court filings with opposing party's financial data hidden behind black boxes -- the data was extracted and reported by journalists
- A company released a report with employee names under black rectangles -- a simple copy-paste revealed every name
| Method | Permanently Removes Data? | Text Selectable Underneath? | Safe for Sensitive Data? |
|---|---|---|---|
| Proper redaction tool (Acrobat, dedicated tool) | Yes | No -- data is deleted | Yes |
| Black highlight/annotation | No | Yes -- fully recoverable | No -- dangerous |
| Black rectangle (drawing tool) | No | Yes -- fully recoverable | No -- dangerous |
| White text on white background | No | Yes -- fully recoverable | No -- dangerous |
| Print and re-scan | Yes (if physically covered) | No (image-based) | Partially -- metadata may survive |
| Flatten after overlay | Depends on tool | Sometimes -- depends on implementation | Unreliable -- verify carefully |
Pro Tip: After redacting a document, always test the result before sharing. Open the redacted PDF, press Ctrl+A to select all text, paste into a text editor, and search for any sensitive terms. If you find any, the redaction was incomplete. Also run pdftotext (a free command-line tool) on the file to extract all text from the content stream. This catches data that visual selection might miss.
Method 1: Redact PDFs with Adobe Acrobat Pro
Adobe Acrobat Pro is the gold standard for PDF redaction. Its redaction tool properly removes underlying data.
Step-by-Step Instructions
- Open the PDF in Adobe Acrobat Pro
- Go to Tools > Redact
- Click Mark for Redaction in the toolbar
- Select text, images, or areas to redact:
- Text selection: Click and drag to highlight text
- Area selection: Draw a rectangle over a region
- Search and redact: Use Find Text & Redact to search for patterns (SSN, phone numbers, specific names) across the entire document
- After marking all items, click Apply Redactions
- Confirm the permanent removal
- Save the redacted document as a new file (do not overwrite the original)
Search and Redact (Pattern Matching)
Acrobat's Find Text & Redact feature is powerful for large documents:
- Search for specific text (names, addresses)
- Search for patterns (Social Security numbers:
\d{3}-\d{2}-\d{4}) - Review all matches and select which to redact
- Apply all redactions at once
Remove Hidden Information
After redacting visible content, use Tools > Redact > Remove Hidden Information to clean:
- Metadata (author, creation date, software used)
- Embedded file attachments
- Bookmarks and comments
- Hidden layers
- Form field data
- JavaScript code
- Cross-reference data
Method 2: Redact PDFs Online
Our online tools provide redaction capabilities through a combination of editing and flattening.
Using the PDF Editor for Redaction
- Open our PDF editor
- Upload the PDF containing sensitive information
- Use the Rectangle tool to draw black rectangles over sensitive areas
- Set the fill color to black and opacity to 100%
- After covering all sensitive content, use our flatten PDF tool to merge the overlays into the page content
- The flattened result removes the ability to move or delete the rectangles
Important: Flattening removes the interactivity of annotations but may not remove underlying text from the content stream in all cases. For documents with the highest sensitivity (legal, medical, financial), use a dedicated redaction tool like Adobe Acrobat Pro that explicitly deletes the underlying data.
Additional Steps for Security
After redacting and flattening:
- Remove metadata using a metadata stripping tool
- Password-protect the redacted file using our password protection tool
- Test the redaction by attempting to select and copy text in the redacted areas

Method 3: Command-Line Redaction
For batch processing or automated redaction workflows:
Using qpdf and pdftk
# Flatten annotations (removes editability but may not remove underlying text)
qpdf --flatten-annotations=all input.pdf flattened.pdf
# For true redaction, use a print-to-PDF approach
# This re-renders the document, discarding hidden content
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
-sOutputFile=clean.pdf input_with_overlays.pdf
Using Python (PyMuPDF)
import fitz # PyMuPDF
doc = fitz.open("input.pdf")
for page in doc:
# Define areas to redact (x0, y0, x1, y1)
areas = [(100, 200, 300, 220), (100, 250, 400, 270)]
for area in areas:
page.add_redact_annot(fitz.Rect(area))
page.apply_redactions() # Permanently removes content
doc.save("redacted.pdf")
PyMuPDF's apply_redactions() method performs true redaction -- it deletes the underlying text data, not just covers it.
What to Redact: Common Sensitive Data Types
| Data Type | Examples | Regulation | Risk If Exposed |
|---|---|---|---|
| Social Security Numbers | 123-45-6789 | Various US federal/state laws | Identity theft, fraud |
| Financial account numbers | Bank accounts, credit cards | GLBA, PCI-DSS | Financial fraud, unauthorized charges |
| Medical information | Diagnoses, prescriptions, MRNs | HIPAA | Privacy violation, discrimination |
| Personal contact info | Home address, phone, email | GDPR, CCPA | Stalking, harassment, spam |
| Minor's information | Children's names, ages, schools | COPPA, FERPA | Child safety risks |
| Trade secrets | Formulas, processes, pricing | Trade secret law | Competitive disadvantage |
| Authentication credentials | Passwords, API keys, tokens | Security best practices | System compromise |
| Biometric data | Fingerprints, facial scans | BIPA, GDPR | Cannot be changed if compromised |
Pro Tip: When redacting documents for legal proceedings, create a redaction log that lists every redaction made, the page and location, the reason for redaction, and the legal basis (e.g., attorney-client privilege, HIPAA exemption). Courts often require this log to accompany redacted documents, and it demonstrates that redactions were made deliberately and with proper justification, not to hide unfavorable information.
Redacting Images and Non-Text Content
Sensitive information is not limited to text. Images, charts, and embedded files may also need redaction.
Photographs
Photos of people, ID documents, license plates, or location-identifying features need redaction. The redaction rectangle covers the image area, and the underlying image data is removed.
Signatures
If a document contains signatures that should not be shared, redact them with black rectangles. The signature image data is permanently deleted.
QR Codes and Barcodes
QR codes and barcodes may contain sensitive data (URLs with tokens, encoded personal information). Redact them like any other visual element.
Embedded Files
PDFs can contain embedded file attachments (spreadsheets, images, other PDFs). Use Adobe Acrobat's Remove Hidden Information tool or manually check for and remove embedded files that contain sensitive data.
Metadata
PDF metadata can reveal:
- Author's name and email
- Software used to create the document
- Creation and modification dates
- Company name
- Document title and subject
- Revision history
Always strip metadata after redaction. In Acrobat: File > Properties > Description -- clear all fields. Or use Tools > Redact > Remove Hidden Information for a thorough cleanup.
Redaction for Specific Industries
Legal
Legal redaction follows strict rules. Courts require specific redaction practices:
- Redact only what is legally required (over-redaction can be challenged)
- Maintain a redaction log with legal justifications
- Keep an unredacted copy in secure storage
- Use proper redaction tools (courts have rejected documents with improper redaction)
- For multi-party cases, different redaction levels may apply for different parties
For managing large legal document sets, our merge PDF tool can combine case files, and our extract pages tool can pull out specific pages for targeted redaction.
Healthcare (HIPAA)
HIPAA requires the "Safe Harbor" method for de-identification, which means removing 18 categories of identifiers:
Names, geographic data, dates, phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number.
Financial Services
Financial document redaction must be thorough because partial numbers can still be useful to attackers. Redact the complete account number, not just the last four digits (showing even partial numbers like "XXXX-XXXX-1234" can be combined with other data for fraud).
Government (FOIA)
FOIA redactions must cite the specific exemption code (b)(1) through (b)(9) that justifies each redaction. Over-classification of redactions is subject to judicial review.

Verification Checklist
After redacting a document, verify the redaction before sharing:
- Visual check -- Scroll through every page to confirm all intended areas are covered
- Select-all test -- Press Ctrl+A, then Ctrl+C, paste into a text editor, and search for sensitive terms
- Text extraction test -- Run
pdftotexton the command line to extract all text - Metadata check -- Open File Properties and verify no sensitive metadata remains
- Attachment check -- Verify no sensitive embedded files remain
- Layer check -- If the PDF has layers, ensure redacted content is not on a hidden layer
- Version check -- If the PDF has multiple versions (incremental saves), ensure older versions do not contain the sensitive data
Common Mistakes to Avoid
- Using black highlighting instead of redaction -- The most dangerous mistake; text remains fully accessible
- Changing font color to white -- Text is invisible but fully selectable and extractable
- Cropping instead of redacting -- Cropped areas may still exist in the PDF data
- Overwriting the original file -- Always save the redacted version as a new file; keep the original in secure storage
- Forgetting metadata -- The author name, revision history, and comments may contain sensitive information
- Partial redaction -- Redacting a name on page 3 but leaving it on pages 7 and 12
- Not testing the result -- Assuming the redaction worked without verification
Redaction Alternatives
Sometimes redaction is not the best approach:
- Create a new document -- Instead of redacting a document with extensive sensitive data, write a new version that includes only the sharable information. This eliminates redaction risk entirely.
- Print and scan -- Printing the document (with physical covers over sensitive areas) and scanning the result creates an image-based PDF that does not contain the original text. However, OCR could re-extract visible text.
- Summary documents -- Create a summary or abstract that conveys the necessary information without including the original document.
- Access control -- Instead of redacting and sharing widely, restrict access to the full document using password protection. See our guide on how to password-protect PDFs.
Proper PDF redaction is a skill that protects individuals, organizations, and data. The technical difference between drawing a black box over text and truly deleting the underlying data is the difference between security and a data breach waiting to happen. Use proper redaction tools, verify your results, and maintain a rigorous process -- especially when legal compliance, patient privacy, or financial data is at stake.
For related security topics, explore our guides on how to password-protect PDFs, data privacy in file conversion, and how to flatten PDFs.



