What an Audiogram Actually Is
An audiogram is a short video clip that combines a static or semi-animated background image with a waveform animation driven by the audio, plus optionally a synchronized caption track. The result is a video file — typically MP4 — that can be posted to Instagram, Twitter/X, LinkedIn, or YouTube Shorts without requiring any actual video footage.
The format fills a specific gap in podcast promotion. Audio-only files cannot be natively uploaded to most social platforms, and a static image with a "listen here" caption is easy to scroll past. An audiogram with a moving waveform, a strong quote as a caption, and a clear podcast branding gives the clip visual motion that earns the pause — and the click.
The mechanics are simple: take a 30-90 second clip from your episode, layer it over a static background image (episode artwork, guest photo, branded card), add a waveform animation that responds to the audio's amplitude, optionally burn in captions, and export as MP4.
Why Podcasters Use Audiograms
The promotion math is straightforward. A 45-minute podcast episode contains dozens of quotable, shareable moments. Most listeners will never share the full episode link — but a 60-second clip with a provocative insight, a funny exchange, or a useful piece of advice is far more likely to get reshared.
Audiograms also work differently than text quotes. A quoted tweet or LinkedIn text post can be rephrased, screenshot, and shared without attribution. An audiogram is inherently tied to its source: the waveform is moving, the guest's face or episode art is visible, and your podcast name is in frame. It functions as a branded clip that drives attribution back to the show even when reshared.
For podcasters with limited video production capacity, audiograms are particularly valuable because they require no camera, no studio setup, and no video editing skill — just a good clip selection and basic tool knowledge.
Step-by-Step Workflow
Step 1: Select and Extract the Right Clip
The strongest audiogram clips are 30-90 seconds long and contain a single, clear insight or exchange. Avoid clips that require context from earlier in the episode to be understood — the audiogram will often be the first contact a listener has with your show.
Good clip types:
- A surprising statistic or counterintuitive claim
- A guest's direct answer to a pointed question
- A short story with a clear setup and punchline
- A practical tip that stands alone
If your episode is already in MP3 or WAV format, you need to isolate the target clip before creating the audiogram. The extract audio from your video tool handles extraction from video interview recordings, and the MP3 converter can prepare audio files in the right format for most audiogram tools.
Once you have the source audio file, use a DAW or an online trimmer to cut your clip precisely. Clean, tight edit points — no dead air at the start or end — make the audiogram feel professional.
Step 2: Choose Your Waveform Style
The waveform is the visual centerpiece of an audiogram, and the style you choose affects both the aesthetic and the perceived energy of the clip.
| Waveform Style | Visual Appearance | Best For | Energy Level |
|---|---|---|---|
| Bar | Vertical frequency bars that pulse with amplitude | Music-forward podcasts, high energy topics | High |
| Line | Continuous waveform like an oscilloscope trace | Minimalist design, interview shows | Medium |
| Radial | Circular waveform emanating from center | Square format, visually bold brands | High |
| Blob | Organic animated shape that deforms with audio | Wellness, creative, lifestyle shows | Low–Medium |
| Mirror | Two mirrored waveforms top and bottom | Wide format, symmetry-focused design | Medium |
The bar waveform is the most recognizable and widely used. The line waveform reads as more editorial and calm — appropriate for long-form interview shows or narrative podcasts. The radial style works well in 1:1 square format where there is no natural "side" to place a horizontal waveform.
Color matters significantly. A waveform in your brand color against a dark background is far more readable than a waveform that blends into a busy background image. High-contrast waveform-to-background combinations read well at small sizes in feeds.
Step 3: Add Captions
Captions are the single biggest factor in audiogram performance. Research across social platforms consistently shows higher engagement on captioned video versus uncaptioned video, and the effect is especially pronounced on mobile where videos autoplay muted.
For audiograms, burned-in captions (embedded in the video rather than as a separate subtitle track) are preferred because they are visible on every platform without requiring the viewer to activate subtitles.
Tools that generate automatic captions from your audio (Headliner, Descript, Recast.studio) have improved significantly — word-level accuracy on clear speech is typically 90%+ with modern models, and you can correct the remaining errors manually before exporting. Always check proper nouns, technical terms, and names, which are where automated captions most often fail.
Caption styling for audiograms: keep word count per caption card low (1-5 words), use large font size, and place captions consistently (center bottom or center of frame, not overlapping the waveform animation).
Step 4: Choose Platform and Export Format
Platform requirements vary enough that a single audiogram file is rarely optimal for all destinations. The minimum viable set is a 9:16 (vertical) version and a 1:1 (square) version, which covers most platforms with minimal additional work.
| Platform | Aspect Ratio | Duration | Resolution | Max File Size | Notes |
|---|---|---|---|---|---|
| Instagram Reels | 9:16 | 15–90 sec | 1080×1920 | 250 MB | Best reach for podcast clips |
| Instagram Feed | 1:1 or 4:5 | Up to 60 sec | 1080×1080 | 250 MB | Older format, still used |
| Twitter/X | 16:9 or 1:1 | Up to 2:20 | 1920×1080 | 512 MB | Widescreen reads well in timeline |
| 1:1 or 16:9 | 3 sec–10 min | 4096×4096 max | 5 GB | Professional audience, longer clips work | |
| YouTube Shorts | 9:16 | Under 60 sec | 1080×1920 | None stated | Shorts feed, searchable |
| 16:9, 1:1, or 9:16 | Up to 240 min | 1080p | 10 GB | Lower organic reach vs. other platforms | |
| TikTok | 9:16 | 15 sec–10 min | 1080×1920 | 287 MB (over 1 min) | Music-forward, younger audience |
Pro Tip: Shoot for Instagram Reels first, then reformat. Reels has the highest organic reach potential for podcast clips, the strictest aspect ratio requirement (9:16), and the most demanding caption legibility constraints (small screen, fast scroll). A Reels-optimized audiogram is easy to adapt to other formats; the reverse is harder.
Step 5: Export as MP4
All major audiogram platforms produce MP4 output (H.264 video, AAC audio) by default, which is the correct format for all the platforms in the table above. Key export settings to verify:
- Video codec: H.264 (not H.265 — social platforms still primarily ingest H.264)
- Frame rate: 30fps (some tools default to 24fps, which is fine but 30fps is more universal)
- Resolution: Platform-specific (see table above)
- Audio: AAC at 44.1 kHz or 48 kHz, 192 kbps minimum
- Bitrate: 5-15 Mbps for 1080p is more than adequate — audiogram content has low motion complexity so even lower bitrates look clean
Audiogram Tools
Headliner
Headliner (headliner.app) is the most widely used dedicated audiogram tool. It offers automatic transcription, multiple waveform styles, caption editing, and direct publishing to social platforms. The free tier generates a limited number of audiograms per month with a Headliner watermark; paid plans remove the watermark and increase output limits.
Headliner's automatic workflow — upload audio, select clip, choose template, add captions, export — takes 10-15 minutes for a polished audiogram. The caption editor is particularly good, allowing word-level timing adjustments.
Recast.studio
Recast.studio targets podcasters specifically, with features built around episode repurposing. Its clip suggestion feature analyzes the full episode audio and recommends moments likely to perform well as audiograms — useful for long-form shows where manually reviewing every minute is impractical. Recast.studio also handles multi-clip batch processing, so you can create 5-10 audiograms from a single episode upload in one session.
Canva / CapCut (Manual Approach)
Canva and CapCut both support basic audiogram creation through their video editor interfaces. Neither offers the dedicated podcast workflow of Headliner or Recast.studio, but both are free with generous limits and produce clean output.
In Canva, you can upload a podcast clip as audio, add a background image, and add the waveform element from the elements panel. Captions require manual typing (no auto-transcription in the free tier). In CapCut, the "Auto Caption" feature is strong and the template library includes several audiogram-adjacent formats.
These tools are the right choice for podcasters who create audiograms occasionally and do not want a subscription commitment.
The FFmpeg Approach for Power Users
For podcasters who batch-create audiograms or need full control over the output, FFmpeg can generate a basic audiogram from command line. This requires a background image and the audio clip — waveform animation is handled by FFmpeg's showwaves or showfreqs filter.
Basic Audiogram With a Bar Waveform
ffmpeg -loop 1 -i background.jpg -i clip.mp3 \
-filter_complex "[1:a]showwaves=s=1080x1920:mode=cline:rate=30:colors=white[waves];
[0:v][waves]overlay=0:H-h-100" \
-c:v libx264 -crf 20 -preset slow -pix_fmt yuv420p \
-c:a aac -b:a 192k -shortest output_audiogram.mp4
This generates a 9:16 audiogram with a white line waveform overlaid near the bottom of the frame. Adjust the overlay=0:H-h-100 offset to control vertical positioning.
Square Format Audiogram
ffmpeg -loop 1 -i background_square.jpg -i clip.mp3 \
-filter_complex "[1:a]showwaves=s=1080x200:mode=p2p:rate=30:colors=#FF6B35[waves];
[0:v][waves]overlay=0:440" \
-c:v libx264 -crf 20 -preset slow -pix_fmt yuv420p \
-c:a aac -b:a 192k -shortest output_square.mp4
The mode=p2p option produces a peak-to-peak waveform that reads as more traditional than the cline (center line) mode.
Pro Tip: The FFmpeg showwaves filter does not produce the polished animated bar waveforms you see from dedicated tools like Headliner. It is better suited for raw technical output, batch scripts, or situations where you are already in a custom FFmpeg pipeline. For audience-facing audiograms, use a dedicated tool for the waveform and caption work, then post-process with FFmpeg if you need specific format adjustments.
Converting GIF Waveform Animations
Some audiogram templates use animated GIF waveform overlays rather than real-time audio visualization. If you have a GIF waveform animation and want to composite it over a static background with audio, FFmpeg handles this cleanly. The convert GIF to MP4 tool is also useful for converting animated GIF elements to a format that composites more cleanly in video editors.
Compressing Final Audiograms
Audiogram files are often smaller than typical video content because they have static or near-static background frames — only the waveform region has motion. This means even aggressive compression settings preserve quality well.
For a 60-second 9:16 audiogram at 1080×1920, target:
- H.264, CRF 22-24
- Expected output size: 3-8 MB depending on waveform complexity
- AAC audio at 192 kbps (audio quality matters here — it is the whole point of the clip)
If you need to reduce an audiogram file size to meet a platform limit, the compress the MP3 file tool can also help if you want to reduce the audio source before generating the audiogram. The audio converter hub covers format conversions if your clip source is in a format (OGG, FLAC, M4A) that your audiogram tool does not directly accept.
For more context on how audio formats and bitrates affect quality in production contexts, the best audio format for podcasts guide and how to convert audio for podcasts guide are useful companion reads. The podcast audio to video repurposing guide covers the broader strategy of turning your back catalog into social content.
Caption Strategy for Audiogram Performance
The caption style you choose affects watch time and shareability. A few tested approaches:
Word-by-Word Highlight: Each word appears in sequence as it is spoken, with the current word highlighted in your brand color. High-energy, feels dynamic. Works best for confident, fast-paced speaking. Common on TikTok and Instagram Reels.
Subtitle-Style: 3-5 words appear at a time, centered below the waveform. Clean, readable, works on all platform sizes. The default in most audiogram tools.
Pull-Quote Style: The entire quote is displayed as a text overlay in large type, readable before playback begins. Good for long-form LinkedIn posts where the viewer may not immediately click play.
No Captions: Only appropriate if the clip works as silent video (ambient audio, music), which is rare for podcast content. Generally avoid for interview-style audiograms.
Caption accuracy verification matters more than most podcasters expect. Automated tools will miscaptionate proper nouns, technical terms, brand names, and any word with unusual pronunciation. Spend 2-3 minutes reviewing and correcting before publishing — incorrect captions visible on-screen read as low-effort and can undermine the professional impression of your show.
Building an Audiogram Workflow
For consistent audiogram production, the most efficient approach is a repeatable template system rather than recreating from scratch each episode.
Create 2-3 master templates (9:16, 1:1, and 16:9) with your podcast branding, consistent font choices, and waveform style locked in. Each new audiogram only requires:
- Uploading the audio clip
- Verifying and correcting the auto-captions
- Exporting in each required format
This reduces per-audiogram production time from 30-45 minutes (design + caption + export) to 10-15 minutes (clip select + caption review + export). For shows publishing weekly, that difference is meaningful.
Frequently Asked Questions
How long should an audiogram be?
Platform constraints are the starting point: Instagram Reels allows 15-90 seconds, Twitter/X allows up to 2:20, YouTube Shorts requires under 60 seconds. Within those limits, the practical sweet spot for most podcast audiograms is 45-75 seconds. Short enough to hold attention without audio context, long enough to deliver a complete insight. Clips under 30 seconds often feel incomplete; clips over 90 seconds rarely see completion rates above 20%.
Do audiograms actually drive podcast listeners?
The data is indirect — social platforms do not report podcast app opens as a conversion metric. What is measurable is link clicks, and audiograms with a clear show identity and a compelling clip do drive link clicks. The more important frame, though, is that audiograms build name recognition over time. A listener who sees your podcast name three times in their feed before encountering it in search is more likely to subscribe than a cold first-contact.
Can I create an audiogram from a video interview recording?
Yes. Extract the audio track from the video first using the extract audio from your video tool, then use the audio clip in your audiogram workflow. If you want to use the video footage itself (guest face on screen rather than static image), most audiogram tools support video background — import the video clip and add the waveform overlay and captions on top.
What is the best waveform style for professional/corporate podcasts?
For B2B or corporate podcasts targeting a LinkedIn audience, the line or mirror waveform styles read as more refined than the bar or radial styles. Pair them with a clean, uncluttered background — episode art or a branded card with restrained typography. Bold, colorful waveform animations work well on consumer-facing shows but can feel out of place in a professional context.
Do I need to own the copyright to the audio in an audiogram?
Yes. If you are clipping your own podcast, you own the content. If you are including guest audio, most podcast recording agreements cover clip sharing for promotion — but verify this with guests if you have not explicitly covered it. Music in the background of an audiogram is a copyright risk: use royalty-free music licensed for video use, your own original music, or no background music at all.
Conclusion
Audiograms are one of the highest-leverage content formats available to podcasters. A single episode can generate 5-10 audiogram clips, each capable of reaching a new audience that would never have searched for your RSS feed. The production overhead, once you have a template system in place, is small relative to the potential distribution.
Start with one platform — Instagram Reels or YouTube Shorts for audio-visual formats, LinkedIn for longer professional clips — and build a consistent cadence before expanding. The format and workflow decisions covered here (9:16 for Reels, H.264 MP4 output, burned-in captions, bar or line waveform) give you a technically sound foundation that works without adjustment across all major social platforms.
Use the extract audio from your video and MP3 converter tools to prepare your source clips, and the audio converter hub when you encounter format compatibility issues with your audiogram tool. If you are building a broader content repurposing strategy around your podcast, the podcast audio to video repurposing guide covers the full landscape of formats and formats beyond the audiogram.



