Gary Simmons commited on
Commit
37d54d8
·
1 Parent(s): 63c91ce

implement YouTube access fallback strategy with enhanced error handling and new tools

Browse files
FIX_SUMMARY.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fix Summary: YouTube Video Access Issues
2
+
3
+ ## Problem
4
+ Agent failed to analyze YouTube video due to network/DNS resolution errors: "Failed to resolve 'www.youtube.com'"
5
+
6
+ ## Solution Implemented
7
+
8
+ ### 1. Created Fallback Tools (`libs/youtube/youtube_web_fallback.py`)
9
+ - **extract_youtube_video_id**: Extract video ID and generate search queries
10
+ - **search_youtube_video_info**: Provide strategic fallback search suggestions
11
+ - **get_youtube_noembed_info**: Use alternative noembed.com API for metadata
12
+
13
+ ### 2. Enhanced Error Handling (`libs/youtube/youtube_video_analyzer.py`)
14
+ - Added retry logic and timeout configuration
15
+ - Improved error messages with actionable suggestions
16
+ - Better DNS/network error detection
17
+
18
+ ### 3. Updated Agent (`app.py`)
19
+ - Added 3 new fallback tools to agent toolset
20
+ - Increased max_steps from 20 to 25
21
+ - Added comprehensive system prompt guiding fallback strategy:
22
+ 1. Try direct YouTube access
23
+ 2. Fall back to noembed API
24
+ 3. Use web search with video ID
25
+ 4. Visit relevant webpages
26
+
27
+ ## Results
28
+
29
+ ✅ **Video metadata now accessible** via noembed API even when YouTube is blocked
30
+ ✅ **Agent automatically switches strategies** when tools fail
31
+ ✅ **Multiple fallback paths** ensure persistence
32
+ ✅ **Better error handling** prevents immediate failure
33
+
34
+ ## Test Results
35
+ ```bash
36
+ $ python test_youtube_fallback.py
37
+ ```
38
+ - ✅ Video ID extraction: SUCCESS
39
+ - ✅ Search strategies: Generated successfully
40
+ - ✅ Noembed API: Retrieved title "Penguin Chicks Stand Up To Giant Petrel...With The Help of a Friend!"
41
+
42
+ ## Next Steps for Your Question
43
+
44
+ For the bird species counting question, the agent will now:
45
+ 1. Get video title via noembed API ✅
46
+ 2. Search for "L1vXCYZAYYM bird species count"
47
+ 3. Search for "Penguin Chicks Giant Petrel reddit"
48
+ 4. Look for detailed video analyses/discussions
49
+ 5. Extract bird species information from search results
50
+
51
+ ## Files Changed
52
+ - ✅ `libs/youtube/youtube_video_analyzer.py` (enhanced)
53
+ - ✅ `libs/youtube/youtube_web_fallback.py` (new)
54
+ - ✅ `app.py` (updated)
55
+ - ✅ `test_youtube_fallback.py` (new test)
56
+
57
+ **The agent is now significantly more robust and won't give up when YouTube access fails!**
QUICK_REFERENCE.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quick Reference: YouTube Fallback Strategy
2
+
3
+ ## When YouTube Access Fails
4
+
5
+ ### Tools Available (in order of preference):
6
+
7
+ 1. **get_youtube_noembed_info(video_url)**
8
+ - Alternative API that bypasses YouTube directly
9
+ - Returns: title, author, thumbnail URL
10
+ - Use first when yt-dlp fails
11
+
12
+ 2. **extract_youtube_video_id(video_url)**
13
+ - Extracts video ID from URL
14
+ - Provides search query suggestions
15
+ - Use to prepare for web search
16
+
17
+ 3. **search_youtube_video_info(video_url)**
18
+ - Generates targeted search strategies
19
+ - Provides Reddit, Twitter, general web queries
20
+ - Use to plan web search approach
21
+
22
+ ### Search Query Templates
23
+
24
+ For video ID `L1vXCYZAYYM`:
25
+ ```
26
+ General:
27
+ - "youtube L1vXCYZAYYM video information"
28
+ - "youtube L1vXCYZAYYM description"
29
+ - "youtube L1vXCYZAYYM transcript"
30
+
31
+ Social Media:
32
+ - "site:reddit.com L1vXCYZAYYM"
33
+ - "site:twitter.com L1vXCYZAYYM"
34
+ - "reddit discussion L1vXCYZAYYM"
35
+
36
+ Specific Content:
37
+ - "L1vXCYZAYYM [specific detail]"
38
+ - "[video title] detailed analysis"
39
+ - "[video title] frame by frame"
40
+ ```
41
+
42
+ ### Example Workflow
43
+
44
+ ```python
45
+ # 1. Try direct access
46
+ result = get_youtube_video_info(video_url)
47
+ # → Fails with DNS error
48
+
49
+ # 2. Try alternative API
50
+ result = get_youtube_noembed_info(video_url)
51
+ # → SUCCESS: Get title "Penguin Chicks Stand Up To Giant Petrel..."
52
+
53
+ # 3. Extract video ID
54
+ result = extract_youtube_video_id(video_url)
55
+ # → Get ID: L1vXCYZAYYM
56
+
57
+ # 4. Search with context
58
+ result = web_search("Penguin Chicks Giant Petrel bird species count")
59
+ # or
60
+ result = web_search("L1vXCYZAYYM reddit bird species")
61
+
62
+ # 5. Visit promising results
63
+ result = visit_webpage("https://reddit.com/r/...")
64
+ ```
65
+
66
+ ## For Your Specific Question
67
+
68
+ **Q:** "What is the highest number of bird species on camera at the same time?"
69
+
70
+ **Strategy:**
71
+ 1. Get title via noembed → "Penguin Chicks Stand Up To Giant Petrel...With The Help of a Friend!"
72
+ 2. Identify species: Penguins (chicks + adults) and Giant Petrel
73
+ 3. Search: "Penguin Chicks Giant Petrel video L1vXCYZAYYM bird species"
74
+ 4. Search: "L1vXCYZAYYM reddit ornithology"
75
+ 5. Look for: Comments mentioning species count, bird identification discussions
76
+ 6. Visit: Nature documentary websites, bird watching forums
77
+
78
+ **Expected Answer:** Based on title alone:
79
+ - Minimum 2 species: Penguins and Giant Petrel
80
+ - Could be more if other birds appear
81
+
82
+ **Next Steps:** Search for detailed frame-by-frame analyses or community discussions that enumerate all bird species visible.
YOUTUBE_FIX_DOCUMENTATION.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # YouTube Access Fix - Implementation Guide
2
+
3
+ ## Problem Summary
4
+ Your agent was failing to access YouTube videos due to network/DNS resolution errors when using `yt-dlp`. The error "Failed to resolve 'www.youtube.com'" prevented the agent from analyzing video content.
5
+
6
+ ## Solution Implemented
7
+
8
+ ### 1. Enhanced Error Handling in YouTube Tools
9
+ **File: `libs/youtube/youtube_video_analyzer.py`**
10
+
11
+ Added better error handling and retry logic:
12
+ - Socket timeout configuration (30 seconds)
13
+ - Retry attempts (3 retries)
14
+ - Source address binding to help with DNS issues
15
+ - Clearer error messages that suggest alternative approaches
16
+
17
+ ### 2. Created Fallback Tools
18
+ **File: `libs/youtube/youtube_web_fallback.py`** (NEW)
19
+
20
+ Three new tools to handle YouTube access failures:
21
+
22
+ #### a) `extract_youtube_video_id`
23
+ - Extracts video ID from any YouTube URL format
24
+ - Provides alternative URL formats
25
+ - Suggests search queries for finding video information
26
+
27
+ #### b) `search_youtube_video_info`
28
+ - Provides strategic search suggestions when direct access fails
29
+ - Generates multiple search query variations
30
+ - Guides the agent to use web search as a fallback
31
+
32
+ #### c) `get_youtube_noembed_info`
33
+ - Uses noembed.com API as alternative metadata source
34
+ - Can retrieve basic video info (title, author, thumbnail) without accessing YouTube directly
35
+ - Works as a lightweight alternative when yt-dlp fails
36
+
37
+ ### 3. Updated Agent Configuration
38
+ **File: `app.py`**
39
+
40
+ Changes made:
41
+ - Added all three fallback tools to the agent's toolset
42
+ - Increased max_steps from 20 to 25 (more steps for fallback strategies)
43
+ - Added comprehensive system prompt with fallback strategy guidance
44
+ - Imported new fallback tools
45
+
46
+ ### 4. System Prompt Strategy
47
+
48
+ The agent now follows this escalation path:
49
+
50
+ ```
51
+ 1. Try direct access (get_youtube_video_info/analyze_youtube_video)
52
+ ↓ (if fails)
53
+ 2. Try alternative API (get_youtube_noembed_info)
54
+ ↓ (if fails)
55
+ 3. Extract video ID (extract_youtube_video_id)
56
+
57
+ 4. Get search strategies (search_youtube_video_info)
58
+
59
+ 5. Use web_search with suggested queries
60
+
61
+ 6. Visit relevant webpages for detailed information
62
+ ```
63
+
64
+ ## How It Works Now
65
+
66
+ ### For the Bird Video Example:
67
+
68
+ **Before:** Agent tried `get_youtube_video_info` → Failed with DNS error → Gave up
69
+
70
+ **After:**
71
+ 1. Agent tries `get_youtube_video_info` → Gets DNS error
72
+ 2. Agent tries `get_youtube_noembed_info` → SUCCESS! Gets title: "Penguin Chicks Stand Up To Giant Petrel...With The Help of a Friend!"
73
+ 3. Now knowing it's about penguins and petrels, agent can:
74
+ - Search: "Penguin Chicks Giant Petrel bird species count"
75
+ - Search: "L1vXCYZAYYM reddit discussion bird species"
76
+ - Visit webpages with detailed descriptions
77
+ - Find community discussions that might mention specific bird counts
78
+
79
+ ## Testing
80
+
81
+ Run the test script to verify all fallback tools work:
82
+
83
+ ```bash
84
+ python test_youtube_fallback.py
85
+ ```
86
+
87
+ Expected output:
88
+ - ✅ Video ID extraction succeeds
89
+ - ✅ Search strategies are generated
90
+ - ✅ Noembed API returns video title and metadata
91
+
92
+ ## Key Benefits
93
+
94
+ 1. **Resilience**: Multiple fallback paths ensure the agent doesn't give up easily
95
+ 2. **Alternative Data Sources**: Uses noembed API and web search instead of just yt-dlp
96
+ 3. **Better Guidance**: System prompt teaches agent how to handle failures
97
+ 4. **Informative Errors**: Clear error messages guide next steps
98
+ 5. **More Steps**: Increased step limit allows for comprehensive fallback attempts
99
+
100
+ ## What to Do If Issues Persist
101
+
102
+ ### Network-Level Solutions:
103
+ If even the fallback tools fail due to network restrictions:
104
+
105
+ 1. **Configure DNS Override**:
106
+ ```python
107
+ # In youtube_video_analyzer.py, add to ydl_opts:
108
+ 'source_address': '8.8.8.8', # Use Google DNS
109
+ ```
110
+
111
+ 2. **Use Proxy** (if available):
112
+ ```python
113
+ ydl_opts = {
114
+ 'proxy': 'http://proxy.example.com:8080',
115
+ # ... other options
116
+ }
117
+ ```
118
+
119
+ 3. **Pre-download Videos**:
120
+ - Download videos beforehand and store locally
121
+ - Modify agent to work with local video files
122
+
123
+ ### Agent-Level Solutions:
124
+
125
+ 1. **Increase Search Diversity**:
126
+ - Add more search query variations
127
+ - Target specific video analysis websites
128
+ - Look for video transcripts on dedicated services
129
+
130
+ 2. **Add Specialized Tools**:
131
+ - Create tool to search video transcript databases
132
+ - Add tool to check video wiki pages
133
+ - Implement tool to query video metadata databases
134
+
135
+ ## Files Modified
136
+
137
+ 1. ✅ `libs/youtube/youtube_video_analyzer.py` - Enhanced error handling
138
+ 2. ✅ `libs/youtube/youtube_web_fallback.py` - NEW fallback tools
139
+ 3. ✅ `app.py` - Added tools, system prompt, increased steps
140
+ 4. ✅ `test_youtube_fallback.py` - NEW test script
141
+
142
+ ## Success Indicators
143
+
144
+ Your agent will now:
145
+ - ✅ Get video title even when YouTube is blocked (via noembed API)
146
+ - ✅ Generate search strategies automatically
147
+ - ✅ Use web search to find video information
148
+ - ✅ Be more persistent with multiple fallback attempts
149
+ - ✅ Provide better error messages explaining what went wrong
150
+
151
+ ## Testing Your Changes
152
+
153
+ To test with a real question:
154
+
155
+ ```python
156
+ from app import BasicAgent
157
+
158
+ agent = BasicAgent()
159
+ question = "What is the highest number of bird species on camera at the same time in this video: https://www.youtube.com/watch?v=L1vXCYZAYYM"
160
+ result = agent(question)
161
+ print(result)
162
+ ```
163
+
164
+ The agent should now:
165
+ 1. Try direct access
166
+ 2. Fall back to noembed API (get title)
167
+ 3. Use web search with video ID
168
+ 4. Find discussions/descriptions
169
+ 5. Provide an answer or explain why it cannot determine the exact count
app.py CHANGED
@@ -23,6 +23,11 @@ from libs.questionHelper.file_tools import fetch_task_files
23
  from libs.chess.chess_tools import analyze_chess_image, analyze_chess_position
24
  from libs.transcription.transcription_tools import transcribe_audio
25
  from libs.youtube.youtube_tools import analyze_youtube_video, get_youtube_video_info
 
 
 
 
 
26
 
27
 
28
  # (Keep Constants as is)
@@ -215,6 +220,38 @@ model = RateLimitedModel(
215
  class BasicAgent:
216
  def __init__(self, name: str = "GGSAgent"):
217
  self.name = name
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
218
  self.code_agent = CodeAgent(
219
  tools=[
220
  DuckDuckGoSearchTool(),
@@ -225,11 +262,14 @@ class BasicAgent:
225
  transcribe_audio,
226
  analyze_youtube_video,
227
  get_youtube_video_info,
 
 
 
228
  analyze_chess_position,
229
  analyze_chess_image,
230
  ],
231
  model=model,
232
- max_steps=20,
233
  verbosity_level=1,
234
  additional_authorized_imports=[
235
  "json",
@@ -255,6 +295,7 @@ class BasicAgent:
255
  "subprocess",
256
  ],
257
  add_base_tools=True,
 
258
  )
259
  print("BasicAgent initialized.")
260
 
 
23
  from libs.chess.chess_tools import analyze_chess_image, analyze_chess_position
24
  from libs.transcription.transcription_tools import transcribe_audio
25
  from libs.youtube.youtube_tools import analyze_youtube_video, get_youtube_video_info
26
+ from libs.youtube.youtube_web_fallback import (
27
+ search_youtube_video_info,
28
+ extract_youtube_video_id,
29
+ get_youtube_noembed_info,
30
+ )
31
 
32
 
33
  # (Keep Constants as is)
 
220
  class BasicAgent:
221
  def __init__(self, name: str = "GGSAgent"):
222
  self.name = name
223
+
224
+ # System prompt to guide the agent on handling failures
225
+ system_prompt = """You are a helpful AI agent that can answer questions using various tools.
226
+
227
+ IMPORTANT: When working with YouTube videos, use this fallback strategy:
228
+
229
+ 1. FIRST ATTEMPT: Try get_youtube_video_info or analyze_youtube_video
230
+
231
+ 2. IF NETWORK ERROR OCCURS (DNS, connection failures):
232
+ - DO NOT retry the same failed tool
233
+ - Proceed immediately to fallback strategies below
234
+
235
+ 3. FALLBACK STRATEGY (in order):
236
+ a) Try get_youtube_noembed_info - uses alternative API
237
+ b) Use extract_youtube_video_id to get the video ID
238
+ c) Use search_youtube_video_info to get search strategy suggestions
239
+ d) Use web_search with queries like:
240
+ - "youtube [video_id] description"
241
+ - "youtube [video_id] reddit discussion"
242
+ - "youtube [video_id] content summary"
243
+ - "site:reddit.com [video_id]"
244
+ e) Use visit_webpage on relevant search results
245
+
246
+ 4. FOR SPECIFIC CONTENT QUESTIONS (counting objects, identifying specific moments):
247
+ - Search for video transcripts, detailed reviews, or frame-by-frame analyses
248
+ - Look for Reddit threads, Twitter discussions, or blog posts about the video
249
+ - Check if there are wikis, databases, or fan sites with detailed information
250
+
251
+ 5. Be persistent and creative - if one search doesn't work, try different query variations
252
+
253
+ 6. Always explain your reasoning when tools fail and what alternative approach you're taking."""
254
+
255
  self.code_agent = CodeAgent(
256
  tools=[
257
  DuckDuckGoSearchTool(),
 
262
  transcribe_audio,
263
  analyze_youtube_video,
264
  get_youtube_video_info,
265
+ search_youtube_video_info,
266
+ extract_youtube_video_id,
267
+ get_youtube_noembed_info,
268
  analyze_chess_position,
269
  analyze_chess_image,
270
  ],
271
  model=model,
272
+ max_steps=25,
273
  verbosity_level=1,
274
  additional_authorized_imports=[
275
  "json",
 
295
  "subprocess",
296
  ],
297
  add_base_tools=True,
298
+ system_prompt=system_prompt,
299
  )
300
  print("BasicAgent initialized.")
301
 
libs/youtube/youtube_video_analyzer.py CHANGED
@@ -47,6 +47,9 @@ def extract_video_frames(
47
  "outtmpl": video_path,
48
  "quiet": True,
49
  "no_warnings": True,
 
 
 
50
  }
51
 
52
  try:
@@ -58,7 +61,9 @@ def extract_video_frames(
58
  # Find the downloaded video file
59
  video_files = list(Path(temp_dir).glob("video.*"))
60
  if not video_files:
61
- raise Exception("No video file found after download. The download may have failed or the video may be unavailable.")
 
 
62
 
63
  actual_video_path = str(video_files[0])
64
 
@@ -175,6 +180,10 @@ def get_video_metadata(video_url: str) -> Dict[str, Any]:
175
  ydl_opts = {
176
  "quiet": True,
177
  "no_warnings": True,
 
 
 
 
178
  }
179
 
180
  with yt_dlp.YoutubeDL(ydl_opts) as ydl:
@@ -198,7 +207,13 @@ def get_video_metadata(video_url: str) -> Dict[str, Any]:
198
  return metadata
199
 
200
  except Exception as e:
201
- return {"error": f"Failed to get video metadata: {str(e)}"}
 
 
 
 
 
 
202
 
203
 
204
  def analyze_youtube_video_frames(
 
47
  "outtmpl": video_path,
48
  "quiet": True,
49
  "no_warnings": True,
50
+ "socket_timeout": 30,
51
+ "retries": 3,
52
+ "source_address": "0.0.0.0",
53
  }
54
 
55
  try:
 
61
  # Find the downloaded video file
62
  video_files = list(Path(temp_dir).glob("video.*"))
63
  if not video_files:
64
+ raise Exception(
65
+ "No video file found after download. The download may have failed or the video may be unavailable."
66
+ )
67
 
68
  actual_video_path = str(video_files[0])
69
 
 
180
  ydl_opts = {
181
  "quiet": True,
182
  "no_warnings": True,
183
+ "socket_timeout": 30,
184
+ "retries": 3,
185
+ # Add source address to help with DNS issues
186
+ "source_address": "0.0.0.0",
187
  }
188
 
189
  with yt_dlp.YoutubeDL(ydl_opts) as ydl:
 
207
  return metadata
208
 
209
  except Exception as e:
210
+ error_msg = str(e)
211
+ # Provide more helpful error message
212
+ if "Failed to resolve" in error_msg or "getaddrinfo failed" in error_msg:
213
+ return {
214
+ "error": f"Network/DNS resolution error: Cannot reach YouTube. This may be due to network restrictions or DNS issues. Try using web search to find information about the video instead. Original error: {error_msg}"
215
+ }
216
+ return {"error": f"Failed to get video metadata: {error_msg}"}
217
 
218
 
219
  def analyze_youtube_video_frames(
libs/youtube/youtube_web_fallback.py ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ YouTube Web Fallback Tools
3
+
4
+ These tools provide fallback methods to get YouTube video information
5
+ when direct access via yt-dlp fails due to network restrictions.
6
+ """
7
+
8
+ from smolagents import tool
9
+ import json
10
+
11
+
12
+ @tool
13
+ def search_youtube_video_info(video_url: str) -> str:
14
+ """
15
+ Search for information about a YouTube video using web search when direct access fails.
16
+
17
+ This is a fallback tool that should be used when get_youtube_video_info or
18
+ analyze_youtube_video fail due to network issues. It constructs a web search
19
+ query to find information about the video.
20
+
21
+ Args:
22
+ video_url: The YouTube video URL to search information about
23
+
24
+ Returns:
25
+ A suggestion for what to search for to find information about this video
26
+ """
27
+ try:
28
+ # Extract video ID from URL
29
+ video_id = None
30
+ if "watch?v=" in video_url:
31
+ video_id = video_url.split("watch?v=")[1].split("&")[0]
32
+ elif "youtu.be/" in video_url:
33
+ video_id = video_url.split("youtu.be/")[1].split("?")[0]
34
+
35
+ if not video_id:
36
+ return json.dumps(
37
+ {
38
+ "status": "error",
39
+ "error": "Could not extract video ID from URL",
40
+ "suggestion": f"Try web_search with: '{video_url} youtube video information'",
41
+ },
42
+ indent=2,
43
+ )
44
+
45
+ # Provide search suggestions
46
+ search_suggestions = {
47
+ "status": "network_fallback",
48
+ "video_id": video_id,
49
+ "video_url": video_url,
50
+ "message": "Direct YouTube access failed. Use these search strategies:",
51
+ "search_strategies": [
52
+ f"youtube {video_id} video information",
53
+ f"youtube {video_id} description transcript",
54
+ f"site:reddit.com youtube {video_id}",
55
+ f"site:twitter.com youtube {video_id}",
56
+ ],
57
+ "instructions": "Use the web_search or visit_webpage tools with the above queries to find information about this video. Look for: video title, description, content summary, comments, or discussions about the video.",
58
+ }
59
+
60
+ return json.dumps(search_suggestions, indent=2)
61
+
62
+ except Exception as e:
63
+ return json.dumps(
64
+ {"status": "error", "error": str(e), "video_url": video_url}, indent=2
65
+ )
66
+
67
+
68
+ @tool
69
+ def extract_youtube_video_id(video_url: str) -> str:
70
+ """
71
+ Extract the video ID from a YouTube URL.
72
+
73
+ Useful for constructing alternative search queries or accessing video
74
+ information through other means when direct access fails.
75
+
76
+ Args:
77
+ video_url: YouTube video URL
78
+
79
+ Returns:
80
+ The extracted video ID or an error message
81
+ """
82
+ try:
83
+ video_id = None
84
+
85
+ if "watch?v=" in video_url:
86
+ video_id = video_url.split("watch?v=")[1].split("&")[0]
87
+ elif "youtu.be/" in video_url:
88
+ video_id = video_url.split("youtu.be/")[1].split("?")[0]
89
+ elif "/embed/" in video_url:
90
+ video_id = video_url.split("/embed/")[1].split("?")[0]
91
+
92
+ if video_id:
93
+ result = {
94
+ "status": "success",
95
+ "video_id": video_id,
96
+ "video_url": video_url,
97
+ "alternative_urls": {
98
+ "standard": f"https://www.youtube.com/watch?v={video_id}",
99
+ "short": f"https://youtu.be/{video_id}",
100
+ "embed": f"https://www.youtube.com/embed/{video_id}",
101
+ },
102
+ "search_query_suggestions": [
103
+ f"youtube video {video_id} information",
104
+ f"youtube {video_id} transcript",
105
+ f"what is in youtube video {video_id}",
106
+ ],
107
+ }
108
+ return json.dumps(result, indent=2)
109
+ else:
110
+ return json.dumps(
111
+ {
112
+ "status": "error",
113
+ "error": "Could not extract video ID from the provided URL",
114
+ "video_url": video_url,
115
+ },
116
+ indent=2,
117
+ )
118
+
119
+ except Exception as e:
120
+ return json.dumps(
121
+ {"status": "error", "error": str(e), "video_url": video_url}, indent=2
122
+ )
123
+
124
+
125
+ @tool
126
+ def get_youtube_noembed_info(video_url: str) -> str:
127
+ """
128
+ Try to get YouTube video information using the noembed.com API as a fallback.
129
+
130
+ This service can sometimes retrieve basic YouTube metadata without directly
131
+ accessing YouTube, which can work around network restrictions.
132
+
133
+ Args:
134
+ video_url: YouTube video URL
135
+
136
+ Returns:
137
+ JSON string with video information or error message
138
+ """
139
+ try:
140
+ import requests
141
+
142
+ # Use noembed API
143
+ api_url = f"https://noembed.com/embed?url={video_url}"
144
+
145
+ response = requests.get(api_url, timeout=10)
146
+ response.raise_for_status()
147
+
148
+ data = response.json()
149
+
150
+ if "error" in data:
151
+ return json.dumps(
152
+ {
153
+ "status": "error",
154
+ "error": data.get("error"),
155
+ "message": "noembed API returned an error. Try using web search instead.",
156
+ "video_url": video_url,
157
+ },
158
+ indent=2,
159
+ )
160
+
161
+ result = {
162
+ "status": "success",
163
+ "source": "noembed_api",
164
+ "title": data.get("title", "Unknown"),
165
+ "author_name": data.get("author_name", "Unknown"),
166
+ "author_url": data.get("author_url", ""),
167
+ "thumbnail_url": data.get("thumbnail_url", ""),
168
+ "html": data.get("html", ""),
169
+ "video_url": video_url,
170
+ }
171
+
172
+ return json.dumps(result, indent=2)
173
+
174
+ except requests.exceptions.RequestException as e:
175
+ return json.dumps(
176
+ {
177
+ "status": "error",
178
+ "error": f"Network request failed: {str(e)}",
179
+ "message": "Could not reach noembed API. Try using web_search to find information about this video.",
180
+ "video_url": video_url,
181
+ },
182
+ indent=2,
183
+ )
184
+ except Exception as e:
185
+ return json.dumps(
186
+ {"status": "error", "error": str(e), "video_url": video_url}, indent=2
187
+ )
test_youtube_fallback.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Test script for YouTube fallback tools
3
+ """
4
+
5
+ from libs.youtube.youtube_web_fallback import (
6
+ search_youtube_video_info,
7
+ extract_youtube_video_id,
8
+ get_youtube_noembed_info,
9
+ )
10
+ import json
11
+
12
+ # Test video URL
13
+ test_url = "https://www.youtube.com/watch?v=L1vXCYZAYYM"
14
+
15
+ print("=" * 60)
16
+ print("Testing YouTube Fallback Tools")
17
+ print("=" * 60)
18
+
19
+ # Test 1: Extract video ID
20
+ print("\n1. Testing extract_youtube_video_id:")
21
+ print("-" * 60)
22
+ result = extract_youtube_video_id(test_url)
23
+ data = json.loads(result)
24
+ print(json.dumps(data, indent=2))
25
+
26
+ # Test 2: Get search info suggestions
27
+ print("\n2. Testing search_youtube_video_info:")
28
+ print("-" * 60)
29
+ result = search_youtube_video_info(test_url)
30
+ data = json.loads(result)
31
+ print(json.dumps(data, indent=2))
32
+
33
+ # Test 3: Try noembed API
34
+ print("\n3. Testing get_youtube_noembed_info:")
35
+ print("-" * 60)
36
+ result = get_youtube_noembed_info(test_url)
37
+ data = json.loads(result)
38
+ print(json.dumps(data, indent=2))
39
+
40
+ print("\n" + "=" * 60)
41
+ print("Tests completed!")
42
+ print("=" * 60)