Agents_Course_Final_Assignment

Running

App Files Files Community

Agents_Course_Final_Assignment / README.md

Gary Simmons

add YouTube video analysis tools and audio transcription capabilities, including documentation and test scripts

5e5f9d1 15 days ago

preview code

raw

history blame contribute delete

2.58 kB

	---
	title: Template Final Assignment
	emoji: 🕵🏻‍♂️
	colorFrom: indigo
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.25.2
	app_file: app.py
	pinned: false
	hf_oauth: true
	# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
	hf_oauth_expiration_minutes: 480
	---

	# Agent with YouTube Video Analysis

	This agent includes advanced YouTube video analysis capabilities using yt-dlp and OpenCV for frame extraction and analysis.

	## Features

	### YouTube Video Analysis Tools

	The agent is equipped with two powerful YouTube video analysis tools:

	#### 1. `analyze_youtube_video(video_url, max_frames=6, interval_seconds=45.0)`
	- Purpose: Downloads a YouTube video and extracts frames at regular intervals for detailed analysis
	- Parameters:
	- `video_url`: YouTube video URL (e.g., https://www.youtube.com/watch?v=VIDEO_ID)
	- `max_frames`: Maximum number of frames to extract (1-10, default: 6)
	- `interval_seconds`: Time interval between extractions (minimum: 10s, default: 45s)
	- Returns: JSON with video metadata, frame timestamps, and detailed descriptions of each frame
	- Use cases: Content analysis, scene detection, video summarization, accessibility descriptions

	#### 2. `get_youtube_video_info(video_url)`
	- Purpose: Quickly retrieves video metadata without downloading
	- Parameters:
	- `video_url`: YouTube video URL
	- Returns: JSON with title, duration, uploader, view count, description, and resolution
	- Use cases: Video verification, content filtering, metadata collection

	### Technical Implementation

	- Video Processing: Uses yt-dlp for robust YouTube video downloading
	- Frame Extraction: OpenCV for efficient frame extraction and processing
	- Image Processing: PIL and numpy for frame manipulation and encoding
	- Analysis Ready: Frames are prepared for image analysis (base64 encoded, resized)
	- Error Handling: Comprehensive error handling for network issues, invalid URLs, and processing failures

	### Example Usage

	The agent can answer questions like:
	- "Analyze this YouTube video and tell me what happens: [URL]"
	- "Extract 5 frames from this video every 60 seconds: [URL]"
	- "What is the title and duration of this video: [URL]"
	- "Describe the visual content of this tutorial video: [URL]"

	### Dependencies

	- yt-dlp: YouTube video downloading
	- opencv-python: Computer vision and frame extraction
	- PIL (Pillow): Image processing
	- numpy: Numerical operations

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference