FFmpeg is an extremely versatile and powerful command-line tool for manipulating audio and video files. One of its many capabilities is extracting the audio track from video files for use as standalone audio files in another format.
In this comprehensive guide, I‘ll demonstrate professional techniques for extracting audio using FFmpeg, including installation, codec handling, format conversion, surround sound handling, batch processing, and more.
Installing and Compiling FFmpeg
Before using FFmpeg, we need to ensure it is installed on our system. Here are overview instructions for some common operating systems:
Linux
On most Linux distributions like Ubuntu or Fedora, we can install FFmpeg via the package manager:
sudo apt update
sudo apt install ffmpeg
However, for the most up-to-date FFmpeg with all features enabled, I recommend compiling from source. While more complex, this builds customized FFmpeg binaries optimized for your system.
macOS
On macOS, we can install FFmpeg through Homebrew:
brew install ffmpeg
As an alternative, we can compile FFmpeg on macOS for specific feature enablement.
Windows
For Windows, I suggest using a recent full FFmpeg build. These contain all components statically linked in one executable without external dependencies. Avoid limitations of platform builds lacking encoders.
Confirm FFmpeg installation by checking the version:
ffmpeg -version
Now we‘re ready to use FFmpeg‘s powerful audio extraction capabilities!
FFmpeg vs Libav
Before we dive in, I want to clarify FFmpeg vs alternatives like Libav. In some Linux distributions, you may notice avconv instead of ffmpeg.
The difference between FFmpeg and Libav comes down to a fork years ago by original FFmpeg developers. In practical terms:
- avconv is provided by Libav
- ffmpeg is from the FFmpeg project
I recommend sticking with FFmpeg over Libav fork avconv – FFmpeg receives significantly more development activity. Over 90% of software relies on FFmpeg rather than avconv.
Now let‘s explore using FFmpeg for professional audio extraction!
Checking Audio Codecs
When extracting audio from video files, we should first check the audio codec of the input video stream. This codec info helps determine the optimal output format and conversion options.
To check audio codecs, use the ffprobe command provided by FFmpeg:
ffprobe -v error -select_streams a:0 -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 input.mp4
This prints the codec of the first audio stream in the input video file. If AAC audio, it will display:
aac
Below are some of the most common audio codecs seen in videos:
| Audio Codec | Description |
|---|---|
| AAC | Advanced Audio Coding |
| AC3 | Dolby Digital audio |
| E-AC3 | Dolby Digital Plus |
| DTS | DTS audio codec |
| MP3 | MPEG-2 Audio Layer III |
| PCM | Pulse Code Modulation uncompressed audio |
So checking the audio codec is always a good first step before extracting to understand the audio properties.
Extracting Audio from Video – Direct Stream Copy
The simplest way to extract audio from a video with FFmpeg is copying the existing audio track directly to a new file without re-encoding. This avoids any quality loss in the audio signal.
For example, to extract the AAC audio stream from an MP4 video into an AAC file:
ffmpeg -i input.mp4 -vn -acodec copy output.aac
Breaking down the parameters:
-i input.mp4– The input MP4 video file-vn– Disables copying video to output-acodec copy– Stream copy audio without re-encodingoutput.aac– Output AAC audio file
The key is -acodec copy to direct copy from the source AAC audio rather than transcoding formats.
We can output to any audio format with -acodec copy, not just AAC. Using stream copy maintains pristine audio quality. Converting formats requires decoding/encoding and risks quality loss.
Extracting Audio to Other Formats
In addition to direct audio stream copies, we may want to extract video audio and simultaneously convert it to another format like MP3 or Ogg Vorbis.
Here is an example command extracting MP4 audio and encoding to MP3:
ffmpeg -i input.mp4 -vn -acodec libmp3lame output.mp3
Instead of -acodec copy this uses FFmpeg‘s libmp3lame encoder to handle MP3 conversion rather than just stream copying.
By swapping -acodec we can encode audio to formats like:
libmp3lame– MP3libfdk_aac– AAClibvorbis– Ogg Vorbislibopus– Opus
The quality and bitrate depends on encoder defaults. Next let‘s see how to control audio coding parameters when converting formats.
Setting Custom Audio Bitrate
Many audio encoders support options for customizing parameters like bitrates or sample rates during extraction.
As one example, when encoding to MP3 we can set a target bitrate in kbps using -b:a:
ffmpeg -i input.mp4 -vn -acodec libmp3lame -b:a 192k output.mp3
Here -b:a 192k configures 192 kbps constant bitrate encoding for libmp3lame.
We can also pass advanced encoder options via -af and filter chains. Here is extracting AAC audio and controlling output loudness with EBU R128 normalization:
ffmpeg -i input.mp4 -vn -acodec libfdk_aac -af "ebur128=target=-23" output.m4a
So beyond just formats, we have deep control over audio coding parameters for professional quality output.
Extracting Audio from Multiple Videos
A handy FFmpeg technique is batch extracting audio from multiple videos in a directory through glob patterns and scripting.
Here is an example script to process all MP4s in a folder:
#!/bin/bash
for f in *.mp4; do
ffmpeg -y -i "$f" -vn -acodec copy "${f%.mp4}.aac"
done
This iterates through glob pattern *.mp4 running FFmpeg to extract an AAC audio copy from each file. {f%.mp4}.aac builds output filenames by removing .mp4 extensions.
We can combine this with custom scripting to automate converting entire video collections for offline playback.
Trimming Audio Extraction Duration
Rather than extracting full audio streams, we can specify start and stop duration trims.
Pass input trimming options to extract just a region from the source video:
ffmpeg -ss 00:01:00 -i input.mp4 -to 00:02:00 -vn -acodec copy output.aac
-ss 00:01:00– Seek to 1 minute as start position-to 00:02:00– End position at 2 minutes
This extracts a 1 minute long portion of audio spanning between 1-2 minutes in the video to the output file.
The duration options are very flexible for controlling start, end, and length.
Changing Audio Stream Selection
Videos can contain multiple audio tracks, like different languages. By default FFmpeg chooses the first audio stream indexed as 0:0.
We can control which audio stream gets extracted using the -map option:
ffmpeg -i input.mp4 -map 0:1 -vn -acodec copy spanish.aac
Here -map 0:1 selects the second audio stream from input indexed as 0:1. Adjust the mapping number to customize which audio ends up in the extracted output.
Checking the ffprobe output confirms the index order of audio streams in the input video source before mapping.
Extracting Multiple Audio Tracks
Building on custom -map selection, if a video contains multiple audio tracks we may want to extract them each into their own standalone audio files.
For instance, to split out 3 languages we can chain FFmpeg commands:
# Extract English audio track
ffmpeg -i input.mp4 -map 0:0 -vn -c copy english.aac
# Extract Spanish audio track
ffmpeg -i input.mp4 -map 0:1 -vn -c copy spanish.aac
# Extract French audio track
ffmpeg -i input.mp4 -map 0:2 -vn -c copy french.aac
Now from one source video, we end up with multiple audio output files separated cleanly by language. Each instance maps only a single audio stream.
We could further extend this by scripting the extraction to simplify largescale multi-language track separation across libraries of videos.
Converting Surround Sound to Stereo
Source videos sometimes contain multi-channel surround sound audio. We may want to downmix this to standard stereo when extracting audio.
Let‘s discuss common surround sound configurations:
| Channels | Name |
|---|---|
| 6 | 5.1 – Left/Right/Center/Subwoofer + Surrounds |
| 7 | 6.1 – 5.1 + Back Surround |
| 8 | 7.1 – 5.1 + Side Surrounds |
We can downmix these surround configurations to stereo with:
ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ac 2 output.wav
The key is -ac 2 to set 2 output audio channels, forcing downmix from original input surround to stereo dual channel.
-acodec pcm_s16le gives us uncompressed PCM 16-bit signed integer samples for pristine quality. Vary sample rate and bit depth encoding parameters as needed for your target playback environment.
Surround sound handling gets more advanced dealing with multichannel inputs, but this covers the primary technique for stereo downmixing.
Normalizing Volume Levels
A handy audio post-processing capability of FFmpeg is volume normalization when extracting from videos with mismatched loudness. This helps output uniform, compliant audio volume levels.
As one example, normalizing volume to -20 dB FS (Full Scale) using the volume audio filter:
ffmpeg -i input.mp4 -vn -c:a copy -af "volumedetect" -af "volume=-20dB" output.aac
Breaking this down:
volumedetect– Analyzes incoming audio volume envelopevolume=-20dB– Normalizes volume to maximum RMS level of -20 dB FS
We can customize the target normalization loudness level as needed with the volume filter.
There are also filters like loudnorm for advanced loudness matching and correction supporting standards like EBU R128. This ensuresprofessional target loudness across large libraries of varied content when extracting audio.
Troubleshooting Audio Extract Issues
When working with FFmpeg audio extraction, here are some common troubleshooting issues I‘ve encountered:
No audio detected in output
If FFmpeg complains it detects no audio streams, ensure your -vn and -acodec settings are configured correctly for the input source audio codec. Testing with -c copy can rule out encoder issues.
Glitching, clicking, or gaps
Try increasing buffer sizes if experiencing glitches, pops, clicks, or intermittent gaps during audio extraction. For example, set -bufsize 2M and -b:a 768k to increase I/O and encoding buffers.
Can‘t directly copy audio codec
Certain formats may not support direct -acodec copy stream copying. In these use cases try encoding original audio to a compatible lossless format like FLAC or WAV first.
Errors mapping streams
Double check -map audio stream indexes match ffprobe outputs especially if receiving errors trying to map non-existent streams.
No audio duration trimming
For duration and -ss/-to trimming issues confirm proper time syntax is used. Debug oddly long or short output durations.
Learning typical failure modes helps diagnose and fix problems extracting audio with FFmpeg!
Conclusion & Best Practices
FFmpeg is an incredibly versatile tool for extracting audio tracks from video files to standalone media like MP3s.
To recap key techniques:
- Check source input audio codec with
ffprobe - Extract via
-acodec copyfor pristine copies - Set custom output format encoding like
-acodec libmp3lame - Control bitrates, loudness, sample rate for output quality
- Map specific multi-channel audio streams to isolate
- Downmix surround sound to standard stereo
- Process collections of video to audio using scripts
- Normalize volume levels across extracted audio
- Troubleshoot issues with unknown codecs or channel mapping
Learning to leverage FFmpeg through the command line may seem intimidating initially, but in practice makes easy work of many complex media conversion and extraction tasks.
I hope this guide served as both a FFmpeg audio extraction tutorial, and demonstration of real insights from a media professional fluent in video & audio encoding – extracting maximum quality media is both science and art!
Feel free to reach out with any other questions on working with FFmpeg, audio engineering, or media formats in general via Twitter or my website. I‘m always happy to chat more about optimizing FFmpeg solutions for your specific use case needs.


