OWolf

2024-09-24 Video Production

Automating Video Metadata Extraction with Bash: A Step-by-Step Guide

By O. Wolfson

Managing a large collection of video files can quickly become cumbersome, especially when you need to gather technical details like codecs, resolutions, frame rates, and audio properties. Luckily, with a simple Bash script and FFmpeg’s ffprobe tool, you can automate this process and store the metadata in a JSON file. In this post, I’ll walk you through a custom script I developed for just this purpose.

The Problem

Whether you’re archiving videos, prepping them for editing, or simply need an inventory, manually extracting metadata for each file can be a tedious task. You might have dozens (or hundreds!) of videos to check. Doing this manually is inefficient, and this is where automation shines.

The Solution

The script I’ve created gathers key information about each video file in a specified directory and outputs it into a well-formatted JSON file. It even calculates the total runtime of all videos and includes it at the top of the output.

Here’s the breakdown of the process.

What the Script Does:

  1. Scans a directory for .mp4 and .mov files.
  2. Uses ffprobe (from FFmpeg) to gather:
    • Video codec
    • Resolution (width x height)
    • Frame rate
    • Audio codec, channels, and sample rate
    • Duration (in seconds)
  3. Calculates the total duration of all videos and adds it to the output.
  4. Generates a JSON file with all this data, which can be used for further processing or documentation.

The Script

Here’s the full script:

bash
#!/bin/bash

# Specify the directory where your video files are located
input_directory="/path/to/your/videos"

# Check if the directory exists
if [ ! -d "$input_directory" ]; then
  echo "Directory $input_directory does not exist."
  exit 1
fi

# Extract the folder name from the input directory path
folder_name=$(basename "$input_directory")

# Output JSON file name based on folder name, saved in the same directory as the videos
output_file="$input_directory/${folder_name}_video_data.json"

# Initialize total duration (in seconds)
total_duration=0

# Loop through all .mp4 and .mov files in the specified directory to calculate total duration
for file in "$input_directory"/*.mp4 "$input_directory"/*.mov; do
  # Get the video duration in seconds and add it to the total duration
  duration=$(ffprobe -v quiet -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file" | awk '{print int($1)}')

  # Add duration to total
  total_duration=$((total_duration + duration))
done

# Convert total duration into hours, minutes, and seconds
hours=$((total_duration / 3600))
minutes=$(( (total_duration % 3600) / 60 ))
seconds=$((total_duration % 60))

# Start JSON object with total duration
echo "{
  \"total_duration\": \"${hours}h ${minutes}m ${seconds}s\",
  \"videos\": [" > "$output_file"

# Initialize the 'first' variable to manage comma separation
first=true

# Loop through all .mp4 and .mov files in the specified directory to gather video details
for file in "$input_directory"/*.mp4 "$input_directory"/*.mov; do
  # Get video properties using ffprobe and assign them to variables
  video_info=$(ffprobe -v quiet -select_streams v:0 -show_entries stream=codec_name,width,height,r_frame_rate -of default=noprint_wrappers=1:nokey=1 "$file")
  audio_info=$(ffprobe -v quiet -select_streams a:0 -show_entries stream=codec_name,channels,sample_rate -of default=noprint_wrappers=1:nokey=1 "$file")
  duration=$(ffprobe -v quiet -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file" | awk '{print int($1)}')

  # If ffprobe returns nothing for a file, skip it
  if [ -z "$video_info" ]; then
    echo "No video stream found for $file, skipping..."
    continue
  fi

  # Parse the video information
  codec=$(echo "$video_info" | sed -n '1p')
  width=$(echo "$video_info" | sed -n '2p')
  height=$(echo "$video_info" | sed -n '3p')
  frame_rate=$(echo "$video_info" | sed -n '4p')

  # Parse the audio information (optional, in case audio isn't present)
  audio_codec=$(echo "$audio_info" | sed -n '1p')
  audio_channels=$(echo "$audio_info" | sed -n '2p')
  audio_sample_rate=$(echo "$audio_info" | sed -n '3p')

  # Get the filename without the directory part
  filename=$(basename "$file")

  # Append information to the JSON array
  if [ "$first" = true ]; then
    first=false
  else
    echo "," >> "$output_file"
  fi

  echo "{
    \"filename\": \"$filename\",
    \"codec\": \"$codec\",
    \"resolution\": \"$width x $height\",
    \"frame_rate\": \"$frame_rate\",
    \"duration\": $duration,
    \"audio_codec\": \"$audio_codec\",
    \"audio_channels\": \"$audio_channels\",
    \"audio_sample_rate\": \"$audio_sample_rate\"
  }" >> "$output_file"
done

# End JSON array
echo "]}" >> "$output_file"

echo "Data saved to $output_file"

How the Script Works

  1. Setting up the environment: The script first checks whether the specified directory exists. If it doesn't, the script exits early to avoid errors.

  2. Looping through files: The script loops over each .mp4 and .mov file in the directory and uses FFmpeg's ffprobe tool to extract metadata about each video and its audio stream (if present). This data is parsed and stored in variables for later use.

  3. Calculating total duration: The script calculates the total duration of all videos in the directory and formats it into hours, minutes, and seconds.

  4. Generating the JSON output: The extracted metadata is structured into a JSON format and saved to a file named after the folder containing the videos.

Output Example

Here’s an example of what the JSON output looks like:

json
{
  "total_duration": "1h 30m 25s",
  "videos": [
    {
      "filename": "video1.mp4",
      "codec": "h264",
      "resolution": "1920 x 1080",
      "frame_rate": "30/1",
      "duration": 3600,
      "audio_codec": "aac",
      "audio_channels": "2",
      "audio_sample_rate": "44100"
    },
    {
      "filename": "video2.mov",
      "codec": "h264",
      "resolution": "1280 x 720",
      "frame_rate": "30/1",
      "duration": 1800,
      "audio_codec": "aac",
      "audio_channels": "2",
      "audio_sample_rate": "44100"
    }
  ]
}

Why Use This Script?

  • Automation: No need to manually check each file’s metadata. The script handles that automatically.
  • Easy Integration: The JSON output can be used in further automation workflows, such as cataloging videos, creating reports, or feeding into another system for processing.
  • Customizable: You can easily modify the script to gather more (or less) information, depending on your needs.

Final Thoughts

This script makes it easy to manage and document large collections of video files. By using ffprobe from FFmpeg, you can extract detailed metadata quickly and accurately, making your workflow much more efficient. Whether you’re a video editor, content creator, or just someone who works with a lot of media files, automating these tasks can save you a lot of time.