Gary Simmons
add YouTube video analysis tools and audio transcription capabilities, including documentation and test scripts
5e5f9d1

A newer version of the Gradio SDK is available: 6.0.0

Upgrade
metadata
title: Template Final Assignment
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Agent with YouTube Video Analysis

This agent includes advanced YouTube video analysis capabilities using yt-dlp and OpenCV for frame extraction and analysis.

Features

YouTube Video Analysis Tools

The agent is equipped with two powerful YouTube video analysis tools:

1. analyze_youtube_video(video_url, max_frames=6, interval_seconds=45.0)

  • Purpose: Downloads a YouTube video and extracts frames at regular intervals for detailed analysis
  • Parameters:
    • video_url: YouTube video URL (e.g., https://www.youtube.com/watch?v=VIDEO_ID)
    • max_frames: Maximum number of frames to extract (1-10, default: 6)
    • interval_seconds: Time interval between extractions (minimum: 10s, default: 45s)
  • Returns: JSON with video metadata, frame timestamps, and detailed descriptions of each frame
  • Use cases: Content analysis, scene detection, video summarization, accessibility descriptions

2. get_youtube_video_info(video_url)

  • Purpose: Quickly retrieves video metadata without downloading
  • Parameters:
    • video_url: YouTube video URL
  • Returns: JSON with title, duration, uploader, view count, description, and resolution
  • Use cases: Video verification, content filtering, metadata collection

Technical Implementation

  • Video Processing: Uses yt-dlp for robust YouTube video downloading
  • Frame Extraction: OpenCV for efficient frame extraction and processing
  • Image Processing: PIL and numpy for frame manipulation and encoding
  • Analysis Ready: Frames are prepared for image analysis (base64 encoded, resized)
  • Error Handling: Comprehensive error handling for network issues, invalid URLs, and processing failures

Example Usage

The agent can answer questions like:

  • "Analyze this YouTube video and tell me what happens: [URL]"
  • "Extract 5 frames from this video every 60 seconds: [URL]"
  • "What is the title and duration of this video: [URL]"
  • "Describe the visual content of this tutorial video: [URL]"

Dependencies

  • yt-dlp: YouTube video downloading
  • opencv-python: Computer vision and frame extraction
  • PIL (Pillow): Image processing
  • numpy: Numerical operations

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference