Agents_Course_Final_Assignment

Sleeping

App Files Files Community

Gary Simmons commited on 10 days ago

Commit

37d54d8

1 Parent(s): 63c91ce

implement YouTube access fallback strategy with enhanced error handling and new tools

Browse files

Files changed (7) hide show

FIX_SUMMARY.md +57 -0
QUICK_REFERENCE.md +82 -0
YOUTUBE_FIX_DOCUMENTATION.md +169 -0
app.py +42 -1
libs/youtube/youtube_video_analyzer.py +17 -2
libs/youtube/youtube_web_fallback.py +187 -0
test_youtube_fallback.py +42 -0

FIX_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,57 @@

+# Fix Summary: YouTube Video Access Issues
+## Problem
+Agent failed to analyze YouTube video due to network/DNS resolution errors: "Failed to resolve 'www.youtube.com'"
+## Solution Implemented
+### 1. Created Fallback Tools (`libs/youtube/youtube_web_fallback.py`)
+- **extract_youtube_video_id**: Extract video ID and generate search queries
+- **search_youtube_video_info**: Provide strategic fallback search suggestions
+- **get_youtube_noembed_info**: Use alternative noembed.com API for metadata
+### 2. Enhanced Error Handling (`libs/youtube/youtube_video_analyzer.py`)
+- Added retry logic and timeout configuration
+- Improved error messages with actionable suggestions
+- Better DNS/network error detection
+### 3. Updated Agent (`app.py`)
+- Added 3 new fallback tools to agent toolset
+- Increased max_steps from 20 to 25
+- Added comprehensive system prompt guiding fallback strategy:
+  1. Try direct YouTube access
+  2. Fall back to noembed API
+  3. Use web search with video ID
+  4. Visit relevant webpages
+## Results
+✅ **Video metadata now accessible** via noembed API even when YouTube is blocked
+✅ **Agent automatically switches strategies** when tools fail
+✅ **Multiple fallback paths** ensure persistence
+✅ **Better error handling** prevents immediate failure
+## Test Results
+```bash
+$ python test_youtube_fallback.py
+```
+- ✅ Video ID extraction: SUCCESS
+- ✅ Search strategies: Generated successfully
+- ✅ Noembed API: Retrieved title "Penguin Chicks Stand Up To Giant Petrel...With The Help of a Friend!"
+## Next Steps for Your Question
+For the bird species counting question, the agent will now:
+1. Get video title via noembed API ✅
+2. Search for "L1vXCYZAYYM bird species count"
+3. Search for "Penguin Chicks Giant Petrel reddit"
+4. Look for detailed video analyses/discussions
+5. Extract bird species information from search results
+## Files Changed
+- ✅ `libs/youtube/youtube_video_analyzer.py` (enhanced)
+- ✅ `libs/youtube/youtube_web_fallback.py` (new)
+- ✅ `app.py` (updated)
+- ✅ `test_youtube_fallback.py` (new test)
+**The agent is now significantly more robust and won't give up when YouTube access fails!**

QUICK_REFERENCE.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# Quick Reference: YouTube Fallback Strategy
+## When YouTube Access Fails
+### Tools Available (in order of preference):
+1. **get_youtube_noembed_info(video_url)**
+   - Alternative API that bypasses YouTube directly
+   - Returns: title, author, thumbnail URL
+   - Use first when yt-dlp fails
+2. **extract_youtube_video_id(video_url)**
+   - Extracts video ID from URL
+   - Provides search query suggestions
+   - Use to prepare for web search
+3. **search_youtube_video_info(video_url)**
+   - Generates targeted search strategies
+   - Provides Reddit, Twitter, general web queries
+   - Use to plan web search approach
+### Search Query Templates
+For video ID `L1vXCYZAYYM`:
+```
+General:
+- "youtube L1vXCYZAYYM video information"
+- "youtube L1vXCYZAYYM description"
+- "youtube L1vXCYZAYYM transcript"
+Social Media:
+- "site:reddit.com L1vXCYZAYYM"
+- "site:twitter.com L1vXCYZAYYM"
+- "reddit discussion L1vXCYZAYYM"
+Specific Content:
+- "L1vXCYZAYYM [specific detail]"
+- "[video title] detailed analysis"
+- "[video title] frame by frame"
+```
+### Example Workflow
+```python
+# 1. Try direct access
+result = get_youtube_video_info(video_url)
+# → Fails with DNS error
+# 2. Try alternative API
+result = get_youtube_noembed_info(video_url)
+# → SUCCESS: Get title "Penguin Chicks Stand Up To Giant Petrel..."
+# 3. Extract video ID
+result = extract_youtube_video_id(video_url)
+# → Get ID: L1vXCYZAYYM
+# 4. Search with context
+result = web_search("Penguin Chicks Giant Petrel bird species count")
+# or
+result = web_search("L1vXCYZAYYM reddit bird species")
+# 5. Visit promising results
+result = visit_webpage("https://reddit.com/r/...")
+```
+## For Your Specific Question
+**Q:** "What is the highest number of bird species on camera at the same time?"
+**Strategy:**
+1. Get title via noembed → "Penguin Chicks Stand Up To Giant Petrel...With The Help of a Friend!"
+2. Identify species: Penguins (chicks + adults) and Giant Petrel
+3. Search: "Penguin Chicks Giant Petrel video L1vXCYZAYYM bird species"
+4. Search: "L1vXCYZAYYM reddit ornithology"
+5. Look for: Comments mentioning species count, bird identification discussions
+6. Visit: Nature documentary websites, bird watching forums
+**Expected Answer:** Based on title alone:
+- Minimum 2 species: Penguins and Giant Petrel
+- Could be more if other birds appear
+**Next Steps:** Search for detailed frame-by-frame analyses or community discussions that enumerate all bird species visible.

YOUTUBE_FIX_DOCUMENTATION.md ADDED Viewed

	@@ -0,0 +1,169 @@

+# YouTube Access Fix - Implementation Guide
+## Problem Summary
+Your agent was failing to access YouTube videos due to network/DNS resolution errors when using `yt-dlp`. The error "Failed to resolve 'www.youtube.com'" prevented the agent from analyzing video content.
+## Solution Implemented
+### 1. Enhanced Error Handling in YouTube Tools
+**File: `libs/youtube/youtube_video_analyzer.py`**
+Added better error handling and retry logic:
+- Socket timeout configuration (30 seconds)
+- Retry attempts (3 retries)
+- Source address binding to help with DNS issues
+- Clearer error messages that suggest alternative approaches
+### 2. Created Fallback Tools
+**File: `libs/youtube/youtube_web_fallback.py`** (NEW)
+Three new tools to handle YouTube access failures:
+#### a) `extract_youtube_video_id`
+- Extracts video ID from any YouTube URL format
+- Provides alternative URL formats
+- Suggests search queries for finding video information
+#### b) `search_youtube_video_info`
+- Provides strategic search suggestions when direct access fails
+- Generates multiple search query variations
+- Guides the agent to use web search as a fallback
+#### c) `get_youtube_noembed_info`
+- Uses noembed.com API as alternative metadata source
+- Can retrieve basic video info (title, author, thumbnail) without accessing YouTube directly
+- Works as a lightweight alternative when yt-dlp fails
+### 3. Updated Agent Configuration
+**File: `app.py`**
+Changes made:
+- Added all three fallback tools to the agent's toolset
+- Increased max_steps from 20 to 25 (more steps for fallback strategies)
+- Added comprehensive system prompt with fallback strategy guidance
+- Imported new fallback tools
+### 4. System Prompt Strategy
+The agent now follows this escalation path:
+```
+1. Try direct access (get_youtube_video_info/analyze_youtube_video)
+   ↓ (if fails)
+2. Try alternative API (get_youtube_noembed_info)
+   ↓ (if fails)
+3. Extract video ID (extract_youtube_video_id)
+   ↓
+4. Get search strategies (search_youtube_video_info)
+   ↓
+5. Use web_search with suggested queries
+   ↓
+6. Visit relevant webpages for detailed information
+```
+## How It Works Now
+### For the Bird Video Example:
+**Before:** Agent tried `get_youtube_video_info` → Failed with DNS error → Gave up
+**After:**
+1. Agent tries `get_youtube_video_info` → Gets DNS error
+2. Agent tries `get_youtube_noembed_info` → SUCCESS! Gets title: "Penguin Chicks Stand Up To Giant Petrel...With The Help of a Friend!"
+3. Now knowing it's about penguins and petrels, agent can:
+   - Search: "Penguin Chicks Giant Petrel bird species count"
+   - Search: "L1vXCYZAYYM reddit discussion bird species"
+   - Visit webpages with detailed descriptions
+   - Find community discussions that might mention specific bird counts
+## Testing
+Run the test script to verify all fallback tools work:
+```bash
+python test_youtube_fallback.py
+```
+Expected output:
+- ✅ Video ID extraction succeeds
+- ✅ Search strategies are generated
+- ✅ Noembed API returns video title and metadata
+## Key Benefits
+1. **Resilience**: Multiple fallback paths ensure the agent doesn't give up easily
+2. **Alternative Data Sources**: Uses noembed API and web search instead of just yt-dlp
+3. **Better Guidance**: System prompt teaches agent how to handle failures
+4. **Informative Errors**: Clear error messages guide next steps
+5. **More Steps**: Increased step limit allows for comprehensive fallback attempts
+## What to Do If Issues Persist
+### Network-Level Solutions:
+If even the fallback tools fail due to network restrictions:
+1. **Configure DNS Override**:
+   ```python
+   # In youtube_video_analyzer.py, add to ydl_opts:
+   'source_address': '8.8.8.8',  # Use Google DNS
+   ```
+2. **Use Proxy** (if available):
+   ```python
+   ydl_opts = {
+       'proxy': 'http://proxy.example.com:8080',
+       # ... other options
+   }
+   ```
+3. **Pre-download Videos**:
+   - Download videos beforehand and store locally
+   - Modify agent to work with local video files
+### Agent-Level Solutions:
+1. **Increase Search Diversity**:
+   - Add more search query variations
+   - Target specific video analysis websites
+   - Look for video transcripts on dedicated services
+2. **Add Specialized Tools**:
+   - Create tool to search video transcript databases
+   - Add tool to check video wiki pages
+   - Implement tool to query video metadata databases
+## Files Modified
+1. ✅ `libs/youtube/youtube_video_analyzer.py` - Enhanced error handling
+2. ✅ `libs/youtube/youtube_web_fallback.py` - NEW fallback tools
+3. ✅ `app.py` - Added tools, system prompt, increased steps
+4. ✅ `test_youtube_fallback.py` - NEW test script
+## Success Indicators
+Your agent will now:
+- ✅ Get video title even when YouTube is blocked (via noembed API)
+- ✅ Generate search strategies automatically
+- ✅ Use web search to find video information
+- ✅ Be more persistent with multiple fallback attempts
+- ✅ Provide better error messages explaining what went wrong
+## Testing Your Changes
+To test with a real question:
+```python
+from app import BasicAgent
+agent = BasicAgent()
+question = "What is the highest number of bird species on camera at the same time in this video: https://www.youtube.com/watch?v=L1vXCYZAYYM"
+result = agent(question)
+print(result)
+```
+The agent should now:
+1. Try direct access
+2. Fall back to noembed API (get title)
+3. Use web search with video ID
+4. Find discussions/descriptions
+5. Provide an answer or explain why it cannot determine the exact count

app.py CHANGED Viewed

@@ -23,6 +23,11 @@ from libs.questionHelper.file_tools import fetch_task_files
 from libs.chess.chess_tools import analyze_chess_image, analyze_chess_position
 from libs.transcription.transcription_tools import transcribe_audio
 from libs.youtube.youtube_tools import analyze_youtube_video, get_youtube_video_info
 # (Keep Constants as is)
@@ -215,6 +220,38 @@ model = RateLimitedModel(
 class BasicAgent:
     def __init__(self, name: str = "GGSAgent"):
         self.name = name
         self.code_agent = CodeAgent(
             tools=[
                 DuckDuckGoSearchTool(),
@@ -225,11 +262,14 @@ class BasicAgent:
                 transcribe_audio,
                 analyze_youtube_video,
                 get_youtube_video_info,
                 analyze_chess_position,
                 analyze_chess_image,
             ],
             model=model,
-            max_steps=20,
             verbosity_level=1,
             additional_authorized_imports=[
                 "json",
@@ -255,6 +295,7 @@ class BasicAgent:
                 "subprocess",
             ],
             add_base_tools=True,
         )
         print("BasicAgent initialized.")

 from libs.chess.chess_tools import analyze_chess_image, analyze_chess_position
 from libs.transcription.transcription_tools import transcribe_audio
 from libs.youtube.youtube_tools import analyze_youtube_video, get_youtube_video_info
+from libs.youtube.youtube_web_fallback import (
+    search_youtube_video_info,
+    extract_youtube_video_id,
+    get_youtube_noembed_info,
+)
 # (Keep Constants as is)
 class BasicAgent:
     def __init__(self, name: str = "GGSAgent"):
         self.name = name
+        # System prompt to guide the agent on handling failures
+        system_prompt = """You are a helpful AI agent that can answer questions using various tools.
+IMPORTANT: When working with YouTube videos, use this fallback strategy:
+1. FIRST ATTEMPT: Try get_youtube_video_info or analyze_youtube_video
+2. IF NETWORK ERROR OCCURS (DNS, connection failures):
+   - DO NOT retry the same failed tool
+   - Proceed immediately to fallback strategies below
+3. FALLBACK STRATEGY (in order):
+   a) Try get_youtube_noembed_info - uses alternative API
+   b) Use extract_youtube_video_id to get the video ID
+   c) Use search_youtube_video_info to get search strategy suggestions
+   d) Use web_search with queries like:
+      - "youtube [video_id] description"
+      - "youtube [video_id] reddit discussion"
+      - "youtube [video_id] content summary"
+      - "site:reddit.com [video_id]"
+   e) Use visit_webpage on relevant search results
+4. FOR SPECIFIC CONTENT QUESTIONS (counting objects, identifying specific moments):
+   - Search for video transcripts, detailed reviews, or frame-by-frame analyses
+   - Look for Reddit threads, Twitter discussions, or blog posts about the video
+   - Check if there are wikis, databases, or fan sites with detailed information
+5. Be persistent and creative - if one search doesn't work, try different query variations
+6. Always explain your reasoning when tools fail and what alternative approach you're taking."""
         self.code_agent = CodeAgent(
             tools=[
                 DuckDuckGoSearchTool(),
                 transcribe_audio,
                 analyze_youtube_video,
                 get_youtube_video_info,
+                search_youtube_video_info,
+                extract_youtube_video_id,
+                get_youtube_noembed_info,
                 analyze_chess_position,
                 analyze_chess_image,
             ],
             model=model,
+            max_steps=25,
             verbosity_level=1,
             additional_authorized_imports=[
                 "json",
                 "subprocess",
             ],
             add_base_tools=True,
+            system_prompt=system_prompt,
         )
         print("BasicAgent initialized.")

libs/youtube/youtube_video_analyzer.py CHANGED Viewed

@@ -47,6 +47,9 @@ def extract_video_frames(
             "outtmpl": video_path,
             "quiet": True,
             "no_warnings": True,
         }
         try:
@@ -58,7 +61,9 @@ def extract_video_frames(
             # Find the downloaded video file
             video_files = list(Path(temp_dir).glob("video.*"))
             if not video_files:
-                raise Exception("No video file found after download. The download may have failed or the video may be unavailable.")
             actual_video_path = str(video_files[0])
@@ -175,6 +180,10 @@ def get_video_metadata(video_url: str) -> Dict[str, Any]:
         ydl_opts = {
             "quiet": True,
             "no_warnings": True,
         }
         with yt_dlp.YoutubeDL(ydl_opts) as ydl:
@@ -198,7 +207,13 @@ def get_video_metadata(video_url: str) -> Dict[str, Any]:
             return metadata
     except Exception as e:
-        return {"error": f"Failed to get video metadata: {str(e)}"}
 def analyze_youtube_video_frames(

             "outtmpl": video_path,
             "quiet": True,
             "no_warnings": True,
+            "socket_timeout": 30,
+            "retries": 3,
+            "source_address": "0.0.0.0",
         }
         try:
             # Find the downloaded video file
             video_files = list(Path(temp_dir).glob("video.*"))
             if not video_files:
+                raise Exception(
+                    "No video file found after download. The download may have failed or the video may be unavailable."
+                )
             actual_video_path = str(video_files[0])
         ydl_opts = {
             "quiet": True,
             "no_warnings": True,
+            "socket_timeout": 30,
+            "retries": 3,
+            # Add source address to help with DNS issues
+            "source_address": "0.0.0.0",
         }
         with yt_dlp.YoutubeDL(ydl_opts) as ydl:
             return metadata
     except Exception as e:
+        error_msg = str(e)
+        # Provide more helpful error message
+        if "Failed to resolve" in error_msg or "getaddrinfo failed" in error_msg:
+            return {
+                "error": f"Network/DNS resolution error: Cannot reach YouTube. This may be due to network restrictions or DNS issues. Try using web search to find information about the video instead. Original error: {error_msg}"
+            }
+        return {"error": f"Failed to get video metadata: {error_msg}"}
 def analyze_youtube_video_frames(

libs/youtube/youtube_web_fallback.py ADDED Viewed

	@@ -0,0 +1,187 @@

+"""
+YouTube Web Fallback Tools
+These tools provide fallback methods to get YouTube video information
+when direct access via yt-dlp fails due to network restrictions.
+"""
+from smolagents import tool
+import json
+@tool
+def search_youtube_video_info(video_url: str) -> str:
+    """
+    Search for information about a YouTube video using web search when direct access fails.
+    This is a fallback tool that should be used when get_youtube_video_info or
+    analyze_youtube_video fail due to network issues. It constructs a web search
+    query to find information about the video.
+    Args:
+        video_url: The YouTube video URL to search information about
+    Returns:
+        A suggestion for what to search for to find information about this video
+    """
+    try:
+        # Extract video ID from URL
+        video_id = None
+        if "watch?v=" in video_url:
+            video_id = video_url.split("watch?v=")[1].split("&")[0]
+        elif "youtu.be/" in video_url:
+            video_id = video_url.split("youtu.be/")[1].split("?")[0]
+        if not video_id:
+            return json.dumps(
+                {
+                    "status": "error",
+                    "error": "Could not extract video ID from URL",
+                    "suggestion": f"Try web_search with: '{video_url} youtube video information'",
+                },
+                indent=2,
+            )
+        # Provide search suggestions
+        search_suggestions = {
+            "status": "network_fallback",
+            "video_id": video_id,
+            "video_url": video_url,
+            "message": "Direct YouTube access failed. Use these search strategies:",
+            "search_strategies": [
+                f"youtube {video_id} video information",
+                f"youtube {video_id} description transcript",
+                f"site:reddit.com youtube {video_id}",
+                f"site:twitter.com youtube {video_id}",
+            ],
+            "instructions": "Use the web_search or visit_webpage tools with the above queries to find information about this video. Look for: video title, description, content summary, comments, or discussions about the video.",
+        }
+        return json.dumps(search_suggestions, indent=2)
+    except Exception as e:
+        return json.dumps(
+            {"status": "error", "error": str(e), "video_url": video_url}, indent=2
+        )
+@tool
+def extract_youtube_video_id(video_url: str) -> str:
+    """
+    Extract the video ID from a YouTube URL.
+    Useful for constructing alternative search queries or accessing video
+    information through other means when direct access fails.
+    Args:
+        video_url: YouTube video URL
+    Returns:
+        The extracted video ID or an error message
+    """
+    try:
+        video_id = None
+        if "watch?v=" in video_url:
+            video_id = video_url.split("watch?v=")[1].split("&")[0]
+        elif "youtu.be/" in video_url:
+            video_id = video_url.split("youtu.be/")[1].split("?")[0]
+        elif "/embed/" in video_url:
+            video_id = video_url.split("/embed/")[1].split("?")[0]
+        if video_id:
+            result = {
+                "status": "success",
+                "video_id": video_id,
+                "video_url": video_url,
+                "alternative_urls": {
+                    "standard": f"https://www.youtube.com/watch?v={video_id}",
+                    "short": f"https://youtu.be/{video_id}",
+                    "embed": f"https://www.youtube.com/embed/{video_id}",
+                },
+                "search_query_suggestions": [
+                    f"youtube video {video_id} information",
+                    f"youtube {video_id} transcript",
+                    f"what is in youtube video {video_id}",
+                ],
+            }
+            return json.dumps(result, indent=2)
+        else:
+            return json.dumps(
+                {
+                    "status": "error",
+                    "error": "Could not extract video ID from the provided URL",
+                    "video_url": video_url,
+                },
+                indent=2,
+            )
+    except Exception as e:
+        return json.dumps(
+            {"status": "error", "error": str(e), "video_url": video_url}, indent=2
+        )
+@tool
+def get_youtube_noembed_info(video_url: str) -> str:
+    """
+    Try to get YouTube video information using the noembed.com API as a fallback.
+    This service can sometimes retrieve basic YouTube metadata without directly
+    accessing YouTube, which can work around network restrictions.
+    Args:
+        video_url: YouTube video URL
+    Returns:
+        JSON string with video information or error message
+    """
+    try:
+        import requests
+        # Use noembed API
+        api_url = f"https://noembed.com/embed?url={video_url}"
+        response = requests.get(api_url, timeout=10)
+        response.raise_for_status()
+        data = response.json()
+        if "error" in data:
+            return json.dumps(
+                {
+                    "status": "error",
+                    "error": data.get("error"),
+                    "message": "noembed API returned an error. Try using web search instead.",
+                    "video_url": video_url,
+                },
+                indent=2,
+            )
+        result = {
+            "status": "success",
+            "source": "noembed_api",
+            "title": data.get("title", "Unknown"),
+            "author_name": data.get("author_name", "Unknown"),
+            "author_url": data.get("author_url", ""),
+            "thumbnail_url": data.get("thumbnail_url", ""),
+            "html": data.get("html", ""),
+            "video_url": video_url,
+        }
+        return json.dumps(result, indent=2)
+    except requests.exceptions.RequestException as e:
+        return json.dumps(
+            {
+                "status": "error",
+                "error": f"Network request failed: {str(e)}",
+                "message": "Could not reach noembed API. Try using web_search to find information about this video.",
+                "video_url": video_url,
+            },
+            indent=2,
+        )
+    except Exception as e:
+        return json.dumps(
+            {"status": "error", "error": str(e), "video_url": video_url}, indent=2
+        )

test_youtube_fallback.py ADDED Viewed

	@@ -0,0 +1,42 @@

+"""
+Test script for YouTube fallback tools
+"""
+from libs.youtube.youtube_web_fallback import (
+    search_youtube_video_info,
+    extract_youtube_video_id,
+    get_youtube_noembed_info,
+)
+import json
+# Test video URL
+test_url = "https://www.youtube.com/watch?v=L1vXCYZAYYM"
+print("=" * 60)
+print("Testing YouTube Fallback Tools")
+print("=" * 60)
+# Test 1: Extract video ID
+print("\n1. Testing extract_youtube_video_id:")
+print("-" * 60)
+result = extract_youtube_video_id(test_url)
+data = json.loads(result)
+print(json.dumps(data, indent=2))
+# Test 2: Get search info suggestions
+print("\n2. Testing search_youtube_video_info:")
+print("-" * 60)
+result = search_youtube_video_info(test_url)
+data = json.loads(result)
+print(json.dumps(data, indent=2))
+# Test 3: Try noembed API
+print("\n3. Testing get_youtube_noembed_info:")
+print("-" * 60)
+result = get_youtube_noembed_info(test_url)
+data = json.loads(result)
+print(json.dumps(data, indent=2))
+print("\n" + "=" * 60)
+print("Tests completed!")
+print("=" * 60)