Convert audio to Whisper-compatible format

useful tips

// what it does

Decodes any input and re-encodes the audio to the exact shape speech models expect: -ar 16000 resamples to 16 kHz, -ac 1 downmixes to mono, and -c:a pcm_s16le writes uncompressed signed 16-bit little-endian PCM into a WAV. Because the WAV muxer only carries audio, the video stream from an mp4 is dropped automatically and only audio is kept. Reach for this before feeding clips to Whisper, whisper.cpp, or other ASR tools that want clean 16 kHz mono PCM rather than a compressed source.

// shell

$ ffmpeg -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

$ ffmpeg -i audio.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

// gotcha

Whisper already resamples internally, so this mainly normalizes format and strips video rather than improving accuracy; also a standard RIFF WAV stores its data size in a 32-bit header field, so with ffmpeg's default (rf64=never) a single file that grows past ~4 GB (roughly 37 hours at 16 kHz mono) overflows and corrupts.

// related commands

> view all useful-tips commands