# YouTube Video Analysis Tools Documentation

## Overview

This project now includes powerful YouTube video analysis capabilities that allow the agent to:

1. **Extract metadata** from YouTube videos without downloading them
2. **Download videos** and extract frames at specified intervals
3. **Analyze visual content** of video frames
4. **Provide timestamped descriptions** of video content

## Tools Available

### 1. `get_youtube_video_info(video_url)`

**Purpose**: Quick metadata retrieval without downloading the video.

**Parameters**:
- `video_url` (str): YouTube video URL

**Returns**: JSON string containing:
- Video title, duration, uploader
- View count, upload date
- Resolution and description excerpt
- Status (success/error)

**Example Usage**:
```python
result = get_youtube_video_info("https://www.youtube.com/watch?v=VIDEO_ID")
```

### 2. `analyze_youtube_video(video_url, max_frames=6, interval_seconds=45.0)`

**Purpose**: Full video analysis with frame extraction and description.

**Parameters**:
- `video_url` (str): YouTube video URL
- `max_frames` (int): Maximum frames to extract (1-10, default: 6)
- `interval_seconds` (float): Time between extractions (min: 10s, default: 45s)

**Returns**: JSON string containing:
- Video metadata
- Frame analyses with timestamps
- Detailed descriptions of visual content
- Extraction summary

**Example Usage**:
```python
result = analyze_youtube_video(
    "https://www.youtube.com/watch?v=VIDEO_ID", 
    max_frames=5, 
    interval_seconds=30.0
)
```

## Agent Integration

The tools are integrated into the `BasicAgent` and can be used through natural language queries:

### Example Queries

1. **Video Information**:
   - "What is the title and duration of this video: [URL]?"
   - "Get information about this YouTube video: [URL]"
   - "How many views does this video have: [URL]?"

2. **Content Analysis**:
   - "Analyze this YouTube video and tell me what happens: [URL]"
   - "Describe the visual content of this tutorial: [URL]"
   - "What can you see in this video: [URL]?"

3. **Frame Extraction**:
   - "Extract 5 frames from this video every 60 seconds: [URL]"
   - "Show me frames from the beginning, middle, and end of this video: [URL]"
   - "Analyze key moments in this video: [URL]"

## Technical Details

### Dependencies
- **yt-dlp**: YouTube video downloading
- **opencv-python**: Frame extraction and processing
- **PIL (Pillow)**: Image processing and encoding
- **numpy**: Numerical operations for image arrays

### Processing Pipeline
1. **Video Download**: yt-dlp downloads video in optimal quality (≤720p)
2. **Frame Extraction**: OpenCV extracts frames at specified intervals
3. **Image Processing**: Frames are resized (512px width) and converted to base64
4. **Analysis Ready**: Frames prepared for image analysis models

### Performance Considerations
- **Download Limits**: Videos are limited to ≤720p to reduce bandwidth
- **Frame Limits**: Maximum 10 frames to control processing time
- **Interval Limits**: Minimum 10 seconds between frames to avoid redundancy
- **Timeout Handling**: Robust error handling for network issues

### Error Handling
- Invalid YouTube URLs
- Network connectivity issues
- Video download failures
- Processing errors
- Unsupported video formats

## Usage Examples

### In Agent Conversations

**User**: "Can you analyze this YouTube video and tell me what it's about? https://www.youtube.com/watch?v=dQw4w9WgXcQ"

**Agent Response**: The agent will:
1. First get video metadata to understand duration and title
2. Extract frames at intervals throughout the video
3. Analyze each frame for visual content
4. Provide a comprehensive summary with timestamps

### Sample Output Structure

```json
{
  "status": "success",
  "video_info": {
    "title": "Video Title",
    "duration": "3:33",
    "uploader": "Channel Name"
  },
  "analysis_summary": "Analyzed 6 frames from 'Video Title' (Duration: 3:33) at 30s intervals.",
  "frames_extracted": 6,
  "frame_analyses": [
    {
      "timestamp_seconds": 0,
      "timestamp_formatted": "0:00", 
      "description": "Description of what's visible in the frame"
    }
  ]
}
```

## Best Practices

1. **Start with video info** for unknown videos to check duration and content
2. **Use appropriate intervals** - shorter for action videos, longer for static content
3. **Limit frame count** for long videos to avoid excessive processing
4. **Handle errors gracefully** - network issues are common with video downloads

## Limitations

- Requires internet connection for video access
- Processing time depends on video length and quality
- Geographic restrictions may apply to some videos
- Rate limiting may occur with excessive usage

## Future Enhancements

Potential improvements could include:
- Integration with image analysis models for automated descriptions
- Audio transcription combined with visual analysis
- Scene change detection for intelligent frame selection
- Batch processing for multiple videos
- Caching mechanisms for frequently accessed videos