PDF to DOCX API — Free Tier + Pay-As-You-Go

Name: PDF to DOCX API
Brand: ConvertIntoMP4
Availability: InStock

Convert PDF to editable Word DOCX via API. Preserves layout, fonts, tables, and images. pdf2docx + OCR fallback.

Start integrating in 5 minutes Get an API key

What it does

The PDF to DOCX API converts PDF documents into editable Word .docx files with preserved layout, font choices, embedded images, and table structure. The pipeline uses pdf2docx for vector-based (i.e., natively-typed) PDFs and falls back to Tesseract OCR plus layout reconstruction for scanned image-based PDFs (auto-detected via the `hasText` heuristic). For OCR jobs, language defaults to English; pass `ocrLanguage=eng+fra+deu` for multi-language OCR or any of the 100+ Tesseract-supported languages.

Output DOCX files open natively in Microsoft Word, Google Docs, LibreOffice Writer, and Apple Pages. Tables in the source PDF reconstruct as native Word tables (editable cells, configurable borders) rather than text blocks. Per-job parameters include `pageRange=1-10`, `mode=fast` (vector-only, fastest) vs `mode=hybrid` (vector + OCR fallback for any pages where text extraction fails), `password=<secret>` for encrypted sources, and `preserveLayout=true|false`.

Setting `preserveLayout=false` produces a flowing-text DOCX where pagination is rebuilt rather than mirrored from the source — useful for content reuse where you want Word to repaginate naturally.

Supported formats

Source formats (1)

Target formats (2)

docx
doc

Quick start

All three SDK languages show the same conversion: a single POST to /v1/convert with your API key in the X-Api-Key header.

curl

curl -X POST https://api.convertintomp4.com/v1/convert \
  -H "X-Api-Key: ck_your_api_key" \
  -F "file=@input.pdf" \
  -F "target=docx"
  -F "mode=hybrid"

Node.js (@convertintomp4/sdk)

import { ConvertIntoMP4 } from "@convertintomp4/sdk";
import fs from "node:fs";

const client = new ConvertIntoMP4({ apiKey: process.env.CIM4_API_KEY });

const job = await client.convert({
  file: fs.createReadStream("input.pdf"),
  target: "docx",
});

const result = await client.waitForJob(job.id);
console.log("Output URL:", result.outputUrl);

Python (convertintomp4)

from convertintomp4 import Client

client = Client(api_key="ck_your_api_key")

with open("input.pdf", "rb") as f:
    job = client.convert(file=f, target="docx")

result = client.wait_for_job(job.id)
print("Output URL:", result.output_url)

Features

pdf2docx for vector PDFs, Tesseract OCR for scans
100+ OCR languages
Table reconstruction as native Word tables
Page-range selection
Two modes: fast (vector-only) or hybrid (OCR fallback)
Layout-preserving or flowing-text output
Password-protected PDF support

Pricing

From $9.99/mo (Pro) or $24.99/mo (Business) — or pay-as-you-go on the API plan.

Free tier: 5 conversions/day, 100 MB file size, no API key required (IP-gated). Pro $9.99/mo: 100/day (2,000/month), 2 GB files. Business $24.99/mo: 1,000/day (20,000/month), 10 GB files, GPU encoding, dedicated support.

See full pricing breakdown →

Built for production

99.9% uptime SLA

Multi-region failover, transparent status page, 60-second response-time guarantee on Business.

Encryption + auto-delete

TLS 1.2+ in transit, AES-256 at rest. Files deleted after 1h / 24h / 7d depending on plan, or instantly via DELETE endpoint. See the security page.

~7s median latency

Most sub-100 MB jobs complete in 6-9 seconds. Webhook-driven async for heavier workloads; waitForJob for synchronous flows.

Frequently Asked Questions

How accurate is PDF to DOCX conversion?

For PDFs generated from Word, InDesign, or LaTeX — typically 95-99% fidelity. Hand-scanned PDFs depend on OCR quality; expect 85-95% character recognition for clean scans at 300 DPI, lower for low-resolution or skewed scans. Multi-column layouts sometimes reflow incorrectly — use `preserveLayout=false` for cleaner output in those cases.

Do scanned PDFs work?

Yes — set `mode=hybrid` and the API runs OCR on any page where the embedded-text layer is empty or unrealistically small. Default `mode=fast` skips OCR and produces an empty DOCX for image-only pages.

Are tables in the PDF converted to editable Word tables?

Yes — pdf2docx's table detector reconstructs row/column structure and emits native Word table XML. Cells are individually editable, with borders and shading preserved when possible. Heavily-nested or merged-cell tables sometimes flatten — check the output before bulk-processing.

What about PDFs with custom fonts?

If the PDF embeds the font, the output DOCX references it by name and Word substitutes if the font is missing locally. For maximum portability, pass `embedFonts=true` and the API embeds the source font directly into the DOCX (subject to font licensing).

Can the API convert encrypted PDFs?

Yes. Pass `password=<secret>` and the API decrypts before conversion. Both user and owner passwords work. Without a password, encrypted PDFs fail with `PDF_ENCRYPTED` rather than producing garbled output.

Or browse the full catalogue of 23 API products →

Get an API key

Start integrating the PDF to DOCX API in five minutes. Read the docs, grab a key, and ship your first conversion before the trial coffee cools.

Create your API key Read the API docs Browse all APIs

PDF to DOCX API — Free Tier + Pay-As-You-Go

Convert PDF to editable Word DOCX via API. Preserves layout, fonts, tables, and images. pdf2docx + OCR fallback.

Start integrating in 5 minutes Get an API key

What it does

Setting `preserveLayout=false` produces a flowing-text DOCX where pagination is rebuilt rather than mirrored from the source — useful for content reuse where you want Word to repaginate naturally.

Supported formats

Source formats (1)

Target formats (2)

docx
doc

Quick start

All three SDK languages show the same conversion: a single POST to /v1/convert with your API key in the X-Api-Key header.

curl

curl -X POST https://api.convertintomp4.com/v1/convert \
  -H "X-Api-Key: ck_your_api_key" \
  -F "file=@input.pdf" \
  -F "target=docx"
  -F "mode=hybrid"

Node.js (@convertintomp4/sdk)

import { ConvertIntoMP4 } from "@convertintomp4/sdk";
import fs from "node:fs";

const client = new ConvertIntoMP4({ apiKey: process.env.CIM4_API_KEY });

const job = await client.convert({
  file: fs.createReadStream("input.pdf"),
  target: "docx",
});

const result = await client.waitForJob(job.id);
console.log("Output URL:", result.outputUrl);

Python (convertintomp4)

from convertintomp4 import Client

client = Client(api_key="ck_your_api_key")

with open("input.pdf", "rb") as f:
    job = client.convert(file=f, target="docx")

result = client.wait_for_job(job.id)
print("Output URL:", result.output_url)

Features

pdf2docx for vector PDFs, Tesseract OCR for scans
100+ OCR languages
Table reconstruction as native Word tables
Page-range selection
Two modes: fast (vector-only) or hybrid (OCR fallback)
Layout-preserving or flowing-text output
Password-protected PDF support

Pricing

From $9.99/mo (Pro) or $24.99/mo (Business) — or pay-as-you-go on the API plan.

See full pricing breakdown →

Built for production

99.9% uptime SLA

Multi-region failover, transparent status page, 60-second response-time guarantee on Business.

Encryption + auto-delete

TLS 1.2+ in transit, AES-256 at rest. Files deleted after 1h / 24h / 7d depending on plan, or instantly via DELETE endpoint. See the security page.

~7s median latency

Most sub-100 MB jobs complete in 6-9 seconds. Webhook-driven async for heavier workloads; waitForJob for synchronous flows.

Frequently Asked Questions

How accurate is PDF to DOCX conversion?

Do scanned PDFs work?

Are tables in the PDF converted to editable Word tables?

What about PDFs with custom fonts?

Can the API convert encrypted PDFs?

Or browse the full catalogue of 23 API products →

Get an API key

Start integrating the PDF to DOCX API in five minutes. Read the docs, grab a key, and ship your first conversion before the trial coffee cools.

Create your API key Read the API docs Browse all APIs

What it does

Supported formats

Source formats (1)

Target formats (2)

Quick start

Features

Pricing

Built for production

99.9% uptime SLA

Encryption + auto-delete

~7s median latency

Frequently Asked Questions

Related APIs

Get an API key

What it does

Supported formats

Source formats (1)

Target formats (2)

Quick start

Features

Pricing

Built for production

99.9% uptime SLA

Encryption + auto-delete

~7s median latency

Frequently Asked Questions

Related APIs

Get an API key