PDF to DOCX API â Free Tier + Pay-As-You-Go
Convert PDF to editable Word DOCX via API. Preserves layout, fonts, tables, and images. pdf2docx + OCR fallback.
What it does
The PDF to DOCX API converts PDF documents into editable Word .docx files with preserved layout, font choices, embedded images, and table structure. The pipeline uses pdf2docx for vector-based (i.e., natively-typed) PDFs and falls back to Tesseract OCR plus layout reconstruction for scanned image-based PDFs (auto-detected via the `hasText` heuristic). For OCR jobs, language defaults to English; pass `ocrLanguage=eng+fra+deu` for multi-language OCR or any of the 100+ Tesseract-supported languages.
Output DOCX files open natively in Microsoft Word, Google Docs, LibreOffice Writer, and Apple Pages. Tables in the source PDF reconstruct as native Word tables (editable cells, configurable borders) rather than text blocks. Per-job parameters include `pageRange=1-10`, `mode=fast` (vector-only, fastest) vs `mode=hybrid` (vector + OCR fallback for any pages where text extraction fails), `password=<secret>` for encrypted sources, and `preserveLayout=true|false`.
Setting `preserveLayout=false` produces a flowing-text DOCX where pagination is rebuilt rather than mirrored from the source â useful for content reuse where you want Word to repaginate naturally.
Supported formats
Source formats (1)
Target formats (2)
- docx
- doc
Quick start
All three SDK languages show the same conversion: a single POST to /v1/convert with your API key in the X-Api-Key header.
curl -X POST https://api.convertintomp4.com/v1/convert \
-H "X-Api-Key: ck_your_api_key" \
-F "file=@input.pdf" \
-F "target=docx"
-F "mode=hybrid"import { ConvertIntoMP4 } from "@convertintomp4/sdk";
import fs from "node:fs";
const client = new ConvertIntoMP4({ apiKey: process.env.CIM4_API_KEY });
const job = await client.convert({
file: fs.createReadStream("input.pdf"),
target: "docx",
});
const result = await client.waitForJob(job.id);
console.log("Output URL:", result.outputUrl);from convertintomp4 import Client
client = Client(api_key="ck_your_api_key")
with open("input.pdf", "rb") as f:
job = client.convert(file=f, target="docx")
result = client.wait_for_job(job.id)
print("Output URL:", result.output_url)Features
- pdf2docx for vector PDFs, Tesseract OCR for scans
- 100+ OCR languages
- Table reconstruction as native Word tables
- Page-range selection
- Two modes: fast (vector-only) or hybrid (OCR fallback)
- Layout-preserving or flowing-text output
- Password-protected PDF support
Pricing
From $9.99/mo (Pro) or $24.99/mo (Business) â or pay-as-you-go on the API plan.
Free tier: 5 conversions/day, 100 MB file size, no API key required (IP-gated). Pro $9.99/mo: 100/day (2,000/month), 2 GB files. Business $24.99/mo: 1,000/day (20,000/month), 10 GB files, GPU encoding, dedicated support.
See full pricing breakdown âBuilt for production
99.9% uptime SLA
Multi-region failover, transparent status page, 60-second response-time guarantee on Business.
Encryption + auto-delete
TLS 1.2+ in transit, AES-256 at rest. Files deleted after 1h / 24h / 7d depending on plan, or instantly via DELETE endpoint. See the security page.
~7s median latency
Most sub-100 MB jobs complete in 6-9 seconds. Webhook-driven async for heavier workloads; waitForJob for synchronous flows.
Frequently Asked Questions
How accurate is PDF to DOCX conversion?
For PDFs generated from Word, InDesign, or LaTeX â typically 95-99% fidelity. Hand-scanned PDFs depend on OCR quality; expect 85-95% character recognition for clean scans at 300 DPI, lower for low-resolution or skewed scans. Multi-column layouts sometimes reflow incorrectly â use `preserveLayout=false` for cleaner output in those cases.
Do scanned PDFs work?
Yes â set `mode=hybrid` and the API runs OCR on any page where the embedded-text layer is empty or unrealistically small. Default `mode=fast` skips OCR and produces an empty DOCX for image-only pages.
Are tables in the PDF converted to editable Word tables?
Yes â pdf2docx's table detector reconstructs row/column structure and emits native Word table XML. Cells are individually editable, with borders and shading preserved when possible. Heavily-nested or merged-cell tables sometimes flatten â check the output before bulk-processing.
What about PDFs with custom fonts?
If the PDF embeds the font, the output DOCX references it by name and Word substitutes if the font is missing locally. For maximum portability, pass `embedFonts=true` and the API embeds the source font directly into the DOCX (subject to font licensing).
Can the API convert encrypted PDFs?
Yes. Pass `password=<secret>` and the API decrypts before conversion. Both user and owner passwords work. Without a password, encrypted PDFs fail with `PDF_ENCRYPTED` rather than producing garbled output.
Related APIs
- MP4 to MP3 APIExtract MP3 audio from MP4 video programmatically. Per-job bitrate and sample rate, ID3 tag injection, fast remux mode.
- PDF to JPG APIRender PDF pages to JPG images programmatically. Per-page DPI, page-range selection, batch ZIP output, ImageMagick + pdftoppm pipeline.
- JPG to PNG APIConvert JPG to PNG with optional transparency, lossless compression, and EXIF preservation via API. Sharp-backed.
- MP4 to GIF APIConvert MP4 video to animated GIF programmatically. Per-job width, fps, palette mode, and start/end trim controls.
- DOCX to PDF APIConvert Word DOCX to PDF programmatically. LibreOffice-rendered with font fallback, headers/footers, and PDF/A archival output.
- HEIC to JPG APIConvert iPhone HEIC and HEIF photos to JPG via API. EXIF rotation, ICC profile, Live Photo frame selection.
Or browse the full catalogue of 23 API products â
Get an API key
Start integrating the PDF to DOCX API in five minutes. Read the docs, grab a key, and ship your first conversion before the trial coffee cools.