How to Create a Video Slideshow from Photos and Images

How to Create a Video Slideshow from Photos and Images — Blog | ConvertIntoMP4

The Simplest Approach: FFmpeg

FFmpeg can turn a folder of images into a video in a single command. No GUI, no export dialogs — just a file in, file out.

Basic Slideshow (Fixed Duration per Image)

# Each image displays for 3 seconds
ffmpeg -framerate 1/3 -pattern_type glob -i '*.jpg' \
  -c:v libx264 -r 30 -pix_fmt yuv420p output.mp4

Breaking down the flags:

-framerate 1/3 — input frame rate of 1 frame per 3 seconds (one image every 3 seconds)
-pattern_type glob -i '*.jpg' — use all JPG files in alphabetical order
-c:v libx264 — encode with H.264 (maximum compatibility)
-r 30 — output at 30fps (required; playback would otherwise be 1fps)
-pix_fmt yuv420p — standard pixel format for compatibility

Important: Image files are processed in alphabetical order. Name your images with zero-padded numbers if order matters: 001_photo.jpg, 002_photo.jpg, etc.

Controlling Image Order

# Name images in sequence and process in order
ffmpeg -framerate 1/4 -i 'img_%03d.jpg' -c:v libx264 -r 30 -pix_fmt yuv420p output.mp4

The %03d pattern matches img_001.jpg, img_002.jpg, etc. — zero-padded 3-digit numbers.

Setting a Consistent Output Resolution

Source photos often have mixed orientations and sizes. Force a consistent output:

# Force 1920x1080 (landscape)
ffmpeg -framerate 1/4 -pattern_type glob -i '*.jpg' \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1" \
  -c:v libx264 -r 30 -pix_fmt yuv420p output_1080p.mp4

The filter chain:

scale=1920:1080:force_original_aspect_ratio=decrease — scales to fit within 1920x1080 while maintaining aspect ratio
pad=1920:1080:(ow-iw)/2:(oh-ih)/2 — adds black bars to fill the remaining space
setsar=1 — normalizes the sample aspect ratio

For portrait photos in a landscape slideshow, this produces black letterboxing. For a blur-fill background instead of black bars, see the Ken Burns section below.

Adding Background Music

# Add audio, trim to video length
ffmpeg -framerate 1/4 -pattern_type glob -i '*.jpg' \
  -i background_music.mp3 \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" \
  -c:v libx264 -r 30 -pix_fmt yuv420p \
  -c:a aac -b:a 192k \
  -shortest \
  output_with_music.mp4

The -shortest flag stops the output when the shorter of the two inputs (video or audio) ends. If the music is longer than the slideshow, it gets cut. If the slideshow is longer than the music, the audio ends first and the remainder is silent — add -af "afade=t=out:st=LAST_FEW_SECONDS:d=2" for a fade-out.

Variable Duration per Image

Different images deserve different amounts of screen time. The cleanest way to handle this in FFmpeg is through a concat demuxer input file:

Create a File List with Custom Durations

Create a file named filelist.txt:

file 'opening.jpg'
duration 5
file 'photo01.jpg'
duration 3
file 'photo02.jpg'
duration 3
file 'photo03.jpg'
duration 4
file 'photo04.jpg'
duration 2
file 'closing.jpg'
duration 5

Then generate the video:

ffmpeg -f concat -safe 0 -i filelist.txt \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" \
  -c:v libx264 -r 30 -pix_fmt yuv420p output_variable.mp4

A Python script to generate filelist.txt from a folder with custom timings:

import os, glob

# Map filenames to durations (seconds)
timing = {
    'opening.jpg': 5,
    'default': 3,  # Default for all others
    'closing.jpg': 6,
}

images = sorted(glob.glob('*.jpg'))

with open('filelist.txt', 'w') as f:
    for img in images:
        duration = timing.get(img, timing['default'])
        f.write(f"file '{img}'\n")
        f.write(f"duration {duration}\n")

print(f"Generated filelist.txt for {len(images)} images")

Adding Transitions

FFmpeg doesn't have built-in "dissolve between slides" transitions in a single pass. The xfade filter handles transitions, but requires either:

Pre-encoded segment files, or
The xfade filter with carefully timed inputs

Cross-Fade Transition Using xfade

# Two images with a 1-second crossfade between them
ffmpeg \
  -loop 1 -t 4 -i photo1.jpg \
  -loop 1 -t 4 -i photo2.jpg \
  -filter_complex "[0:v][1:v]xfade=transition=fade:duration=1:offset=3,format=yuv420p" \
  -r 30 output_fade.mp4

For multiple images, chain the xfade filters:

# Three images with crossfades
ffmpeg \
  -loop 1 -t 5 -i photo1.jpg \
  -loop 1 -t 5 -i photo2.jpg \
  -loop 1 -t 5 -i photo3.jpg \
  -filter_complex "
    [0:v]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2[v0];
    [1:v]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2[v1];
    [2:v]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2[v2];
    [v0][v1]xfade=transition=fade:duration=1:offset=4[xf1];
    [xf1][v2]xfade=transition=fade:duration=1:offset=8,format=yuv420p[out]
  " -map "[out]" -r 30 output_transitions.mp4

Available xfade transitions include: fade, wipeleft, wiperight, wipeup, wipedown, slideleft, slideright, slideup, slidedown, circlecrop, circleopen, circleclose, horzopen, horzclose, vertopen, vertclose, dissolve, pixelize, diagtl, diagtr, diagbl, diagbr.

Scripting Multi-Image Transitions

For slideshows with many images, writing xfade chains manually isn't practical. A Python script:

import subprocess
import glob
import os

def create_slideshow_with_transitions(image_folder, output_file,
                                       duration_per_image=4,
                                       transition_duration=1,
                                       transition_type='fade',
                                       size='1920:1080'):
    images = sorted(glob.glob(os.path.join(image_folder, '*.jpg')))
    if not images:
        images = sorted(glob.glob(os.path.join(image_folder, '*.png')))

    n = len(images)
    w, h = size.split(':')

    # Build FFmpeg command
    inputs = []
    for img in images:
        inputs.extend(['-loop', '1', '-t', str(duration_per_image), '-i', img])

    # Build filter chain
    scale_filter = f"scale={w}:{h}:force_original_aspect_ratio=decrease,pad={w}:{h}:(ow-iw)/2:(oh-ih)/2"

    filter_parts = []
    for i in range(n):
        filter_parts.append(f"[{i}:v]{scale_filter}[v{i}]")

    # Chain xfade transitions
    prev = 'v0'
    for i in range(1, n):
        offset = (i * duration_per_image) - (i * transition_duration) - transition_duration
        out = f"xf{i}" if i < n-1 else "out"
        filter_parts.append(
            f"[{prev}][v{i}]xfade=transition={transition_type}:duration={transition_duration}:offset={offset}[{out}]"
        )
        prev = out

    filter_parts.append(f"[out]format=yuv420p[final]")

    cmd = inputs + [
        '-filter_complex', ';'.join(filter_parts),
        '-map', '[final]',
        '-r', '30',
        '-c:v', 'libx264', '-preset', 'fast',
        output_file
    ]

    subprocess.run(['ffmpeg'] + cmd, check=True)

# Usage
create_slideshow_with_transitions('./photos', 'slideshow.mp4')

Ken Burns Effect (Pan and Zoom)

The Ken Burns effect — slow zooms and pans on still images — adds motion and visual interest. FFmpeg's zoompan filter handles this:

# Zoom in slowly on a single image (5 seconds)
ffmpeg -loop 1 -i photo.jpg -t 5 \
  -vf "zoompan=z='min(zoom+0.0015,1.5)':d=125:x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)',scale=1920:1080,format=yuv420p" \
  -r 25 output_kenburns.mp4

Breaking down the zoompan filter:

z='min(zoom+0.0015,1.5)' — zoom increases 0.0015 per frame, capped at 1.5x zoom
d=125 — duration in frames (125 frames ÷ 25fps = 5 seconds)
x/y — center the zoom on the image center

For a pan-right effect:

-vf "zoompan=z=1.2:d=125:x='iw/zoom*(on/125)':y='ih/2-(ih/zoom/2)',scale=1920:1080"

Platform-Specific Settings

Different platforms have specific requirements for slideshow videos:

Platform	Resolution	Aspect Ratio	Frame Rate	Format
Instagram Feed	1080x1080	1:1	30fps	MP4, H.264
Instagram Reels	1080x1920	9:16	30fps	MP4, H.264
TikTok	1080x1920	9:16	30fps	MP4, H.264
YouTube	1920x1080	16:9	24/30fps	MP4, H.264
Facebook	1280x720+	16:9	30fps	MP4, H.264
LinkedIn	1920x1080	16:9	25/30fps	MP4, H.264

Use the video converter to adjust aspect ratio and resolution after creating the base slideshow, or the crop video tool to reframe a 16:9 slideshow for vertical platforms.

Adding Text Overlays

# Add title text to the first 3 seconds
ffmpeg -i slideshow.mp4 \
  -vf "drawtext=fontfile=/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf:text='Our Trip to Spain':x=(w-text_w)/2:y=(h-text_h)/2:fontsize=72:fontcolor=white:enable='between(t,0,3)'" \
  -c:v libx264 -c:a copy output_with_title.mp4

For multiple text overlays at different times, chain drawtext filters or apply them in sequence.

Photo Quality Considerations

Before creating the slideshow, ensure source photos are sized appropriately:

Too small: Photos smaller than 1920x1080 will be upscaled, which reduces sharpness
Too large: 50MP camera files create processing overhead unnecessarily — resize to 3840x2160 (4K) maximum before creating the slideshow
Mixed orientations: Script the portrait/landscape handling if your photos include both

Use the image compressor to reduce large source images before processing, or the resize image tool to standardize dimensions. For converting HEIC phone photos (common from iPhones) before using them in slideshows, the HEIC to JPG converter handles batch conversion.

Frequently Asked Questions

What image formats does FFmpeg accept for slideshows?

FFmpeg handles JPEG, PNG, BMP, TIFF, WebP, and most common image formats. RAW camera files (CR2, NEF, ARW) require conversion to JPEG or PNG first. HEIC files from iPhones also require conversion. Use a batch conversion tool before running the slideshow command.

Why does my slideshow output have inconsistent timing?

This usually happens when the image durations don't align cleanly with the output frame rate. Use duration values that are multiples of 1/framerate. For 30fps output, use durations like 2.0, 3.0, 4.0 seconds rather than 2.5 seconds (which creates 75 frames — fine) versus 2.3 seconds (69 frames — leads to rounding inconsistencies).

Can I add custom audio per image section?

Yes, but it requires separate audio tracks with precise timing. The cleanest approach: create the video without audio, create an audio file with the exact same total duration and proper music/voiceover timing, then combine them:

ffmpeg -i video_no_audio.mp4 -i audio_track.mp3 -c:v copy -c:a aac -shortest output_final.mp4

How do I handle mixed portrait and landscape photos?

Use the scale + pad filter combination shown earlier. For portrait photos, this creates black bars on the sides in a landscape slideshow. Alternatively, use a blurred version of the photo as the background fill:

-vf "split[base][over];[base]scale=1920:1080,boxblur=10,setsar=1[bg];[over]scale=1920:1080:force_original_aspect_ratio=decrease,setsar=1[fg];[bg][fg]overlay=(W-w)/2:(H-h)/2"

This creates a blurred background from the photo itself instead of solid black bars.

Conclusion

Creating a video slideshow from images spans from a single FFmpeg command for a basic result to a scripted pipeline for a polished production. The core workflow is always the same: assemble images in order, set duration, optionally add transitions and audio, encode to H.264 MP4 for maximum compatibility.

For photos that need format conversion before the slideshow process, the image converter handles batch conversion of any image format. The how to convert images to PDF guide covers the parallel use case when you need a document rather than a video, and the GIF maker is the right tool when you need a short looping animation instead of a full video file.