What MHT Is For
MHT (also called MHTML) is the "MIME HTML" format. Internet Explorer used it as the default "Save Page As" format for many years. The file packages an entire web page with images, CSS, and other resources into a single .mht file.
In 2026, MHT is mostly historical:
- Internet Explorer is retired (Microsoft stopped support 2022)
- Chrome and Firefox don't support MHT in default builds
- Modern web archiving uses WARC (Web Archive)
But organizations still have MHT archives from 2005-2015. This post covers conversion to modern formats.
For broader document conversion, see our document converter.
What's in an MHT File
An MHT file is a MIME multipart message:
MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_NextPart_000_0001"
------=_NextPart_000_0001
Content-Type: text/html; charset=utf-8
Content-Location: index.html
<html>...</html>
------=_NextPart_000_0001
Content-Type: image/jpeg
Content-Location: image.jpg
Content-Transfer-Encoding: base64
[base64 image data]
The HTML and resources are embedded in one file. Useful for preservation but inflates file size.
Modern Browser Support
| Browser | MHT support |
|---|---|
| Internet Explorer | Yes (native) |
| Microsoft Edge | Yes (legacy mode) |
| Chrome | Behind flag (--save-page-as-mhtml) |
| Firefox | No (was removed) |
| Safari | No |
For viewing legacy MHT: use IE compatibility (Edge), Chrome with flag, or convert to modern format.
Conversion to Single HTML
For viewing MHT in modern browsers, extract to single HTML:
# Python script
import email
with open("input.mht", "rb") as f:
msg = email.message_from_bytes(f.read())
for part in msg.walk():
if part.get_content_type() == "text/html":
html = part.get_payload(decode=True).decode("utf-8")
with open("output.html", "w") as f:
f.write(html)
The HTML retains its structure but references to embedded resources (images) need extracting separately.
For batch processing, see Batch Processing Files Guide.
Conversion to PDF
For archival PDF:
Option 1: Open in IE (or Edge legacy mode), Print to PDF
- Open MHT in Internet Explorer or Edge legacy mode
- File > Print
- Select "Microsoft Print to PDF"
- Save
This preserves visual layout. Limited to platforms with IE/Edge legacy.
Option 2: Convert via wkhtmltopdf
# Convert HTML extracted from MHT
wkhtmltopdf extracted.html output.pdf
wkhtmltopdf is a standalone tool that renders HTML to PDF. Good output quality.
Option 3: Use online converter
Various online services convert MHT to PDF. Privacy concerns for sensitive content.
For PDF context, see our PDF converter.
Conversion to Single-File HTML
For "self-contained HTML" (similar to MHT but modern):
# Use SingleFile (browser extension or CLI)
single-file https://example.com saved-page.html
# For an existing MHT, extract first then re-save
SingleFile produces an HTML file with all resources embedded as data URLs. Modern equivalent of MHT.
Conversion to WARC
WARC (Web Archive) is the international standard for web preservation:
- Format: ISO 28500
- Tool: wget, Heritrix, Browsertrix
- Use: Internet Archive, national libraries
- Advantage: industry standard, broad tool support
For batch MHT to WARC: extract HTML and resources, re-package as WARC. Manual or scripted.
Conversion Pipeline
A typical legacy MHT conversion workflow:
- Extract MHT contents: Python email module or specialized tools
- Parse HTML: BeautifulSoup or similar
- Resolve resource references: extract images, CSS, scripts
- Reassemble: as single HTML, PDF, or WARC
Code example:
import email
from email import policy
# Parse MHT
with open("input.mht", "rb") as f:
msg = email.message_from_bytes(f.read(), policy=policy.default)
# Extract resources
resources = {}
html_content = None
for part in msg.walk():
content_id = part.get("Content-Location") or part.get("Content-ID")
if part.get_content_type() == "text/html":
html_content = part.get_payload(decode=True).decode("utf-8")
elif content_id:
resources[content_id] = part.get_payload(decode=True)
# Save resources to disk
import os
os.makedirs("resources", exist_ok=True)
for cid, data in resources.items():
filename = cid.split("/")[-1]
with open(f"resources/{filename}", "wb") as f:
f.write(data)
# Save HTML (with re-mapped resource paths)
# ... process html_content to reference local files ...
Common Issues
Images not displaying: resource references not preserved. Re-map paths in HTML to local files.
Encoding issues: MHT uses MIME encoding. Decode base64 sections explicitly.
Layout broken: CSS not extracted or modified. Verify CSS files are present.
Forms don't work: JavaScript may have been altered. MHT preserves at point-of-save; live functionality (forms, dynamic content) is frozen.
Large file size: MHT inflates due to base64 encoding. Single HTML with data URLs is similar size.
When to Just Re-Capture
For some workflows, re-capturing the page is easier than converting old MHT:
# Save page as PDF
wkhtmltopdf https://example.com page.pdf
# Save page as single HTML (with all resources embedded)
single-file https://example.com page.html
# Capture as WARC
wget --warc-file=archive --recursive --level=1 https://example.com
For active sites: re-capture is fresh. For deleted sites: MHT may be the only record.
Web Archive Standards
| Format | Year | Status |
|---|---|---|
| MHT/MHTML | 1999 | Legacy, IE-era |
| WARC | 2009 | Current standard |
| Wayback HTML | n/a | Internet Archive's format |
| HAR (HTTP Archive) | 2012 | Network-level capture |
| SingleFile HTML | 2018 | Modern alternative |
For new web archiving: WARC (institutional) or SingleFile (personal). For converting old MHT: extract to modern format.
For archival considerations, see FFV1 Archival Codec (video equivalent).
Privacy and Copyright
MHT files preserve a snapshot of a website at point-of-save:
- Personal data may be included (logged-in views, user content)
- Site terms of service may restrict redistribution
- Copyright applies to the captured content
For organizational archives: review what's in MHT files before sharing or processing.
Common Issues
Large size doesn't compress further: base64-encoded images are already encoded. The MHT itself is text but the embedded data isn't very compressible.
Cannot open MHT with passwords: rare but possible (some MHT writers added password protection). No standard way to recover.
Different MHT readers show differently: rendering inconsistencies. Check with multiple tools.
For broader file format conversions, see Searchable PDF With OCR.
Frequently Asked Questions
Should I convert my MHT archive?
For active research: yes, modern formats (WARC, single HTML) are easier to use. For passive archive: keep MHT until needed.
Can I read MHT on a Mac?
Some viewer apps support MHT. Or convert to PDF/HTML for native viewing.
Is MHT a security risk?
Old MHT files can contain stale or malicious JavaScript. For untrusted sources: convert to PDF or static HTML to neutralize scripts.
What about Edge's legacy mode?
Microsoft Edge can open MHT in IE Mode. For Windows users: easiest viewing path.
How big are MHT files?
A typical news article: 200-500 KB. A page with many images: 2-10 MB. Larger pages with video: 50+ MB.
Can I extract just the text?
Use Python's email module to extract HTML, then BeautifulSoup or html2text to get plain text:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
text = soup.get_text()
Related Reading
Bottom Line
For MHT/MHTML in 2026: convert to PDF (for viewing), WARC (for institutional archive), or SingleFile HTML (for personal modern equivalent). Open MHT in Edge legacy mode if needed for occasional viewing. For new web archiving: WARC or SingleFile, not MHT. Our document converter handles related document conversions.



