Why Convert PDF to EPUB?
PDFs are designed for fixed-layout printing — every element has an exact position on a fixed-size page. This works well for desktop screens and printers, but it is a terrible reading experience on e-readers and phones. You are constantly zooming, panning, and struggling with text that is either too small or too large.
EPUB (Electronic Publication) is a reflowable format designed specifically for e-readers. Text reflows to fit any screen size, font size is adjustable, and the reading experience adapts to the device — whether that is a 6-inch Kindle, a 10-inch iPad, or a 4-inch phone screen.
The catch is that PDF-to-EPUB conversion is one of the hardest format conversions to do well. PDFs do not contain semantic structure — they store text as positioned characters on a canvas. A heading in a PDF is just text rendered in a larger font at specific coordinates. The converter must reconstruct paragraphs, headings, lists, and chapters from this raw positioning data. The results range from excellent (for simple text-based PDFs) to unusable (for complex, multi-column, image-heavy layouts).
What Converts Well (and What Does Not)
| PDF Type | Conversion Quality | Notes |
|---|---|---|
| Text-only documents | Excellent | Clean paragraph extraction |
| Simple books (fiction) | Good | Minor formatting quirks |
| Technical books with code | Fair | Code blocks may lose formatting |
| Multi-column layouts | Poor | Columns get interleaved |
| Scanned PDFs (image-based) | Requires OCR first | See note below |
| Forms and interactive PDFs | Very poor | Interactivity is lost |
| PDFs with complex tables | Poor | Tables rarely survive intact |
Scanned PDFs: If your PDF is a scan (the pages are images, not text), you must run OCR first to extract the text. See our guide on how to OCR scanned documents.
Method 1: Using Calibre (Best Results)
Calibre is the gold standard for ebook conversion. Its PDF input plugin includes heuristic processing that detects paragraph boundaries, headers, and chapter breaks:
ebook-convert input.pdf output.epub \
--enable-heuristics \
--title "Book Title" \
--authors "Author Name"
Key Calibre Options
ebook-convert input.pdf output.epub \
--enable-heuristics \
--unwrap-factor 0.45 \
--no-images \
--title "Book Title" \
--authors "Author Name" \
--language en \
--chapter "//*[re:test(., 'Chapter|CHAPTER')]" \
--page-breaks-before "//*[re:test(., 'Chapter|CHAPTER')]"
--enable-heuristics— activates paragraph detection and line unwrapping--unwrap-factor 0.45— controls how aggressively short lines are joined (0.0-1.0)--no-images— skips images for a text-only conversion (faster, cleaner)--chapter— XPath expression to detect chapter headings--page-breaks-before— inserts page breaks before detected chapters
Fine-Tuning Output
If Calibre merges lines incorrectly (common with poetry or code), reduce the unwrap factor:
ebook-convert input.pdf output.epub \
--enable-heuristics --unwrap-factor 0.2
If headings are not detected, specify them manually:
ebook-convert input.pdf output.epub \
--chapter "//h:h1|//h:h2" \
--level1-toc "//h:h1" \
--level2-toc "//h:h2"
Method 2: Using Pandoc
Pandoc can convert PDF to EPUB, though it relies on extracting text via pdftotext first:
pandoc input.pdf -o output.epub \
--metadata title="Book Title" \
--metadata author="Author Name" \
--toc --toc-depth=2
Pandoc's PDF reading is simpler than Calibre's — it works well for straightforward text documents but struggles with complex layouts.
Method 3: Online Conversion
Use the PDF to EPUB converter for quick conversions without installing any software. Upload your PDF and download a reflowable EPUB. For more ebook options, check our ebook format guide.
Improving Conversion Quality
Pre-Processing the PDF
For best results, prepare the PDF before conversion:
-
Extract text first to verify it is selectable (not scanned):
pdftotext input.pdf - | head -20 -
Remove headers and footers that will appear on every page in the EPUB:
# Calibre can handle this with regex ebook-convert input.pdf output.epub \ --search-replace '[["running header text", ""]]' -
Check for embedded fonts — unusual fonts may cause character mapping issues
Post-Processing the EPUB
After conversion, open the EPUB in Calibre's editor (calibre → right-click → Edit Book) to:
- Fix misdetected chapters
- Remove duplicate headers/footers that survived conversion
- Clean up formatting artifacts
- Add a proper cover image
- Edit the table of contents
Quality and Settings Tips
Font embedding: By default, Calibre embeds fonts in the EPUB. For maximum compatibility with e-readers, use --subset-embedded-fonts to include only the glyphs actually used in the text, reducing file size.
Image quality: If the PDF contains images you want to preserve, Calibre extracts and re-encodes them. Control the quality with:
ebook-convert input.pdf output.epub \
--output-profile kindle_oasis \
--jpeg-quality 85
Cover image: PDFs do not have a designated cover. Extract the first page as an image and use it:
# Extract first page as PNG
pdftoppm -f 1 -l 1 -png input.pdf cover
# Use in conversion
ebook-convert input.pdf output.epub --cover cover-1.png
Table of contents: Calibre generates a TOC from detected chapters. If your PDF has a table of contents page, Calibre may detect it — but for best results, specify chapter detection patterns manually with --chapter.
For more on PDF processing, see our guide on how to convert PDF to Word.
Common Issues and Troubleshooting
Text runs together without paragraph breaks
The PDF uses line breaks instead of paragraph spacing. Increase the heuristic unwrap factor:
ebook-convert input.pdf output.epub \
--enable-heuristics --unwrap-factor 0.6
Paragraphs are split mid-sentence
The unwrap factor is too low, so Calibre is not joining continued lines. Increase it to 0.45-0.55.
Gibberish characters in output
The PDF uses non-standard character encoding or embedded fonts with custom character mappings. Try extracting text with pdftotext -layout input.pdf first. If the text is garbled there too, the issue is in the PDF itself — not the converter.
Headers and footers repeated on every page
PDF headers/footers are positioned text, not metadata. The converter cannot automatically distinguish them from body text. Use Calibre's search-and-replace during conversion to remove known header/footer text, or manually remove them in the EPUB editor after conversion.
Images are missing or low quality
PDF images may be compressed in formats the converter does not handle well. Try converting with explicit image quality settings. For image-heavy PDFs (textbooks, art books), a fixed-layout EPUB may be more appropriate than a reflowable one.
Conclusion
PDF-to-EPUB conversion works best with simple, text-heavy documents — novels, reports, articles, and documentation. Calibre with heuristic processing gives the best results for most content. For complex, multi-column, or image-heavy PDFs, expect to do some post-conversion cleanup in Calibre's EPUB editor. The effort is worth it: a properly converted EPUB provides a dramatically better reading experience on e-readers and phones than a PDF ever can.
Ready to convert? Try our free PDF to EPUB converter — no registration required.



