OCR API — Free Tier + Pay-As-You-Go
Optical Character Recognition for PDFs and images via API. 100+ languages, searchable PDF output, plain-text extraction.
What it does
The OCR API runs Tesseract 5 over scanned PDFs, photographs of documents, and image files, producing either plain-text extraction, hOCR (XML with bounding boxes), or searchable PDF output (the original document with an invisible text layer overlaid for full-text search). Tesseract supports 100+ languages with strong accuracy for Latin scripts (typically 95-99% character recognition on clean 300 DPI scans) and acceptable accuracy for Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, and Korean (90-95% on equivalent quality sources). Per-job parameters expose the full Tesseract surface: language (`ocrLanguage=eng+fra+deu` for multi-lingual), PSM (page segmentation mode, 1-13), OEM (engine mode, 0-3), confidence threshold, output format, dewarping (`autoRotate=true` corrects skew), and pre-processing (`enhance=true` applies adaptive thresholding before OCR for better noisy-scan results).
For PDFs, OCR is applied page-by-page with parallel processing; multi-language detection runs first to pick the optimal language model per page when `ocrLanguage=auto` is set.
Supported formats
Source formats (9)
- jpg
- jpeg
- png
- tiff
- tif
- bmp
- webp
- heic
Target formats (4)
- txt
- hocr
- tsv
Quick start
All three SDK languages show the same conversion: a single POST to /v1/convert with your API key in the X-Api-Key header.
curl -X POST https://api.convertintomp4.com/v1/ocr \
-H "X-Api-Key: ck_your_api_key" \
-F "file=@scan.pdf" \
-F "ocrLanguage=eng" \
-F "output=searchable-pdf"import { ConvertIntoMP4 } from "@convertintomp4/sdk";
import fs from "node:fs";
const client = new ConvertIntoMP4({ apiKey: process.env.CIM4_API_KEY });
const job = await client.ocr({
file: fs.createReadStream("scan.pdf"),
ocrLanguage: "eng",
output: "searchable-pdf",
});
const result = await client.waitForJob(job.id);
console.log("Searchable PDF:", result.outputUrl);from convertintomp4 import Client
client = Client(api_key="ck_your_api_key")
with open("scan.pdf", "rb") as f:
job = client.ocr(file=f, ocr_language="eng", output="searchable-pdf")
result = client.wait_for_job(job.id)
print("Searchable PDF:", result.output_url)Features
- Tesseract 5 with 100+ language packs
- Output: searchable PDF, plain text, hOCR, TSV
- Auto-rotation and dewarping
- Adaptive thresholding for noisy scans
- Per-page parallel processing
- Auto-language detection (multi-page PDFs)
- Confidence-threshold filtering
Pricing
From $9.99/mo (Pro) or $24.99/mo (Business) — or pay-as-you-go on the API plan.
Free tier: 5 conversions/day, 100 MB file size, no API key required (IP-gated). Pro $9.99/mo: 100/day (2,000/month), 2 GB files. Business $24.99/mo: 1,000/day (20,000/month), 10 GB files, GPU encoding, dedicated support.
See full pricing breakdown →Built for production
99.9% uptime SLA
Multi-region failover, transparent status page, 60-second response-time guarantee on Business.
Encryption + auto-delete
TLS 1.2+ in transit, AES-256 at rest. Files deleted after 1h / 24h / 7d depending on plan, or instantly via DELETE endpoint. See the security page.
~7s median latency
Most sub-100 MB jobs complete in 6-9 seconds. Webhook-driven async for heavier workloads; waitForJob for synchronous flows.
Frequently Asked Questions
How accurate is OCR for English documents?
95-99% character recognition on clean 300 DPI scans of printed text in standard fonts. Accuracy drops for handwriting (50-70%), skewed scans (80-90%), low-DPI mobile photos (75-90%), and stylised fonts (85-95%). Use `enhance=true` to pre-process noisy sources for better results.
What's a searchable PDF?
The original PDF with an invisible text layer overlaid behind the image content. Visually identical to the source, but full-text searchable in any PDF reader and indexable by search engines / DMS systems. The most common OCR output mode for document archival.
Can the API detect language automatically?
Yes. Set `ocrLanguage=auto` and the API runs language detection on a per-page basis, then applies the optimal Tesseract language model. Slower than explicit language selection but invaluable for mixed-language archives where the language varies page-by-page.
Which languages have the best OCR accuracy?
Latin-script languages (English, French, German, Spanish, Italian, Portuguese, Dutch) — typically 95-99% on clean scans. Cyrillic (Russian, Bulgarian, Ukrainian) — 90-95%. Arabic, Hebrew, Chinese, Japanese, Korean — 85-95% depending on font and source quality. See Tesseract's per-language quality matrix in our docs.
Are tables preserved in OCR output?
For plain-text output, table structure is approximated via whitespace; cells are space-separated, rows are line-separated. For hOCR output, cells have explicit bounding boxes you can use to reconstruct the table. For native table extraction with row/column structure, use the PDF to DOCX API with OCR mode.
Related APIs
- Compression APICompress video, image, PDF, and audio files programmatically. Per-file-type presets and target-size mode.
- Merge APIMerge PDFs, videos, images, and audio files programmatically. Order-preserving, format-aware concatenation.
- Split APISplit PDFs by page range, videos by duration, audio by silence detection via API. Per-output naming and ZIP delivery.
- Watermark APIAdd text or image watermarks to PDFs, videos, and images via API. Configurable position, opacity, rotation, and tiling.
- File Conversion APIOne unified file conversion API for video, audio, image, document, ebook, archive, and font formats — 255 formats, 2,290+ conversion pairs.
- Convert APIProduction-grade file conversion API with 9 language SDKs, async webhooks, and cloud-to-cloud workflows. Free tier available.
Or browse the full catalogue of 23 API products →
Get an API key
Start integrating the OCR API in five minutes. Read the docs, grab a key, and ship your first conversion before the trial coffee cools.