Three Formats, Three Different Jobs
SubRip (SRT), Web Video Text Tracks (VTT), and Advanced SubStation Alpha (ASS) all describe text overlaid on video. They're not interchangeable.
SRT is the oldest, simplest, and most universally supported. VTT is what HTML5 video expects. ASS is what anime fan-translators built when SRT couldn't handle karaoke timing and complex positioning.
Picking the right format depends on three questions:
- Where does the video play? (Web, native player, video editor, hardware device)
- Do you need styling? (Color, position, font, effects)
- Are these closed captions or subtitles? (For deaf/HoH viewers vs translation)
Our video editing tools support all three. This post is about choosing between them.
What Each Format Looks Like
SRT (SubRip):
1
00:00:00,000 --> 00:00:03,500
The simplest possible format.
2
00:00:03,500 --> 00:00:06,000
Index, timecode, text, blank line.
That's the entire spec. Bold, italic, underline, and color via HTML-like tags (<b>, <i>, <u>, <font color="red">). Most players ignore the styling tags. The format is text-and-timing, period.
VTT (Web Video Text Tracks):
WEBVTT
NOTE Comment lines start with NOTE
00:00:00.000 --> 00:00:03.500 line:90% align:center
The HTML5 standard format.
00:00:03.500 --> 00:00:06.000 position:50% size:40%
With CSS-like positioning.
VTT is what <track> element expects in HTML5 video. Supports CSS styling, positioning, regions, and chapters. Decimal seconds (.500) instead of comma-decimal (,500).
ASS (Advanced SubStation Alpha):
[Script Info]
Title: Example
ScriptType: v4.00+
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, ...
Style: Default,Arial,40,&H00FFFFFF,&H00000000,&H00000000,&H00000000,...
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:03.50,Default,,0,0,0,,Karaoke {\k50}with {\k60}timing
ASS is structured. It defines styles up front, references them per line. Supports per-character timing (the {\k} tag), animation, rotations, blur, drawing primitives, and embedded fonts.
Compatibility Matrix
| Player | SRT | VTT | ASS |
|---|---|---|---|
HTML5 <track> | No | Yes | No |
| YouTube | Yes | Yes | No (uploads convert) |
| Vimeo | Yes | Yes | Limited |
| VLC | Yes | Yes | Yes |
| MPV | Yes | Yes | Yes |
| QuickTime | Yes | Yes | No |
| Windows Media Player | Yes (via codec) | No | No |
| Plex | Yes | Yes | Yes (transcoded) |
| Netflix (consumer) | n/a | Yes | n/a |
| Premiere Pro | Yes | Yes | Limited |
| DaVinci Resolve | Yes | Yes | No |
| iPhone Camera Roll | Embedded only | n/a | n/a |
| Android default player | Yes | Limited | No |
For broadest reach: SRT. For HTML5 video on the web: VTT. For karaoke or anime: ASS. For everything else: SRT.
When to Use SRT
Use SRT when:
- Your video plays on multiple platforms (some you don't control)
- You're providing translation subtitles (no styling needed)
- You're submitting to streaming services (most accept SRT)
- You want maximum compatibility with consumer hardware
- You want a format you can edit in any text editor
SRT is the lingua franca. Every video player on every platform handles it. The lack of styling is a feature, not a bug.
For the encoding side of subtitles, see Subtitles Burn vs Soft Mux.
When to Use VTT
Use VTT when:
- You're embedding subtitles in HTML5 video on your website
- You need CSS-style positioning or coloring
- You want chapter markers tied to a video player
- Your accessibility audit requires WCAG-compliant captions
- You're building a custom video player
The HTML5 <track> element doesn't accept SRT directly. Convert SRT to VTT for web embedding:
ffmpeg -i input.srt output.vtt
That's it. The conversion is mechanical. The two formats describe the same thing in slightly different syntax.
When to Use ASS
Use ASS when:
- You're producing anime-style karaoke with per-syllable timing
- You need positioning that varies per line
- You want embedded fonts in the subtitle file
- You're doing motion typography integrated with video
- Your viewers will play the video in MPV or VLC
ASS handles styling that other formats can't. The trade-off is much narrower player support. If you ship ASS on YouTube, the platform converts it to a flat caption format and loses your styling.
Pro Tip: For ASS-styled subtitles destined for platforms that strip styling (YouTube, social), burn the subtitles into the video frames before upload. The styling becomes part of the picture. See Subtitles Burn vs Soft Mux for the mechanics.
Closed Captions vs Subtitles vs SDH
These three terms refer to different things:
- Subtitles: translation of dialogue, assumes the viewer can hear the audio
- Closed Captions (CC): full transcription of audio for deaf/HoH viewers, includes "[door slams]" sound effects, speaker identification, music descriptions
- SDH (Subtitles for the Deaf and Hard of Hearing): hybrid; foreign audience but with sound effect notation
All three formats (SRT, VTT, ASS) can hold any of these. The format doesn't determine the type. The content does.
For ADA compliance and broadcast standards: CC needs sound effect descriptions, music notation ("♪ Energetic music ♪"), and speaker identification. Plain SRT subtitles miss these. Mark them up properly.
Format Conversion
| From | To | Tool |
|---|---|---|
| SRT | VTT | FFmpeg or our SRT to VTT converter |
| VTT | SRT | FFmpeg or our VTT to SRT converter |
| ASS | SRT | FFmpeg (loses styling) |
| ASS | VTT | FFmpeg (loses styling) |
| SRT | ASS | Aegisub or manual restyling |
| Burn into video | Any source | FFmpeg with subtitles filter |
FFmpeg conversion examples:
ffmpeg -i input.srt output.vtt
ffmpeg -i input.ass -c:s subrip output.srt
ffmpeg -i input.vtt -c:s subrip output.srt
Our video converter handles batch subtitle extraction from MKV containers, where multiple subtitle tracks coexist.
Embedded vs Sidecar
Subtitles can be:
- Sidecar files (
.srtnext to the.mp4): the player loads them at playback time - Embedded in container (subtitle stream inside MKV or MP4): travels with the video
- Burned in (rendered into the video frames): part of the picture
| Method | Pros | Cons |
|---|---|---|
| Sidecar | Editable, viewer can toggle | Must travel with video |
| Embedded | Self-contained, viewer can toggle | Some players don't support |
| Burned | Universal compatibility | Permanent, viewer can't toggle |
For YouTube and Vimeo: upload subtitles separately. Don't burn them. The platform's accessibility tools work with their internal subtitle representation.
For social platforms (TikTok, Instagram): burn the subtitles. Most users watch with sound off, and the platform's auto-captions are unreliable.
For DVD-style delivery: embed in MKV. Players that support MKV (VLC, MPV, Plex) handle multi-track subtitles cleanly.
Common Mistakes
Mistake 1: Uploading SRT to a player expecting VTT. Most web players won't load SRT directly.
Mistake 2: Saving VTT with comma-decimal timecodes (00:00:00,000). VTT requires period-decimal (00:00:00.000). Most decent text editors auto-correct this.
Mistake 3: Encoding subtitles as Windows-1252 instead of UTF-8. Non-ASCII characters (accented letters, emoji, non-Latin scripts) corrupt. Always save as UTF-8 with BOM disabled.
Mistake 4: Burning subtitles, then later wanting to edit them. The text is now part of the video. There's no undo. Always keep the source SRT/VTT/ASS file as a master.
Mistake 5: Forgetting to verify subtitle timing after re-encoding video. Frame-rate changes shift the timing. SRT timecodes are absolute; if the video frame rate changes, subtitles drift.
Frequently Asked Questions
Why does my SRT file show garbled characters?
Encoding mismatch. The file was saved as Windows-1252 or another non-UTF-8 encoding. Open in VS Code or Notepad++, save as UTF-8 (no BOM), reload in the player.
Can I have multiple subtitle tracks in one MP4?
Technically yes (MP4 supports text tracks), but compatibility is poor. Most players handle multi-track subtitles only in MKV containers. For multi-language work: MKV with embedded SRT tracks, one per language.
What does "soft" vs "hard" subtitles mean?
Soft = sidecar or embedded, viewer can toggle on/off. Hard = burned into video frames, always visible. Both terms are used interchangeably with "open" (always visible) and "closed" (toggleable) captions in some contexts.
Are auto-generated subtitles OK?
For accessibility legal compliance: no. ADA Title III lawsuits target sites that rely on machine-generated captions. Auto-generated subtitles are a starting point that needs human review and correction.
How do I extract subtitles from an MKV?
ffmpeg -i input.mkv -map 0:s:0 -c:s copy output.srt
The -map 0:s:0 selects the first subtitle track. Use 0:s:1 for the second, and so on. Our video extraction tools handle similar extraction patterns for audio tracks.
Should I include CC for music videos?
Yes. Music descriptions help deaf viewers understand the mood and instrumentation. Common notation: "♪ slow melancholic piano ♪" or "[guitar riff]" for instrumental moments. Some platforms have specific music notation standards; check before submitting.
Related Reading
Bottom Line
For broadest compatibility: SRT. For HTML5 web embed: VTT. For styled or animated subtitles: ASS. Convert SRT to VTT mechanically with FFmpeg when you need it for the web. Our video converter handles subtitle extraction from MKV containers and our audio extractor covers the audio-track equivalent.



