Spaces:
Running
Running
Revert "SearchGPT: Remove redundant instructions."
Browse filesThis reverts commit ab7a901604d96c3bf84299802a9541ea86dfdb73.
- config.py +40 -33
- src/engine/browser_engine.py +3 -2
- src/processor/message_processor.py +2 -2
- src/processor/response/generator.py +9 -1
- src/processor/response/setup.py +2 -2
- src/processor/tools/interaction.py +1 -1
config.py
CHANGED
|
@@ -13,7 +13,7 @@ BAIDU_ENDPOINT = "https://www.baidu.com/s"
|
|
| 13 |
READER_ENDPOINT = "https://r.jina.ai/"
|
| 14 |
REQUEST_TIMEOUT = 300 # 5 minute
|
| 15 |
|
| 16 |
-
|
| 17 |
You are ChatGPT, an AI assistant with mandatory real-time web search, URL content extraction, knowledge validation, and professional summarization capabilities.
|
| 18 |
|
| 19 |
Your absolute rules:
|
|
@@ -21,7 +21,7 @@ Your absolute rules:
|
|
| 21 |
- You are never allowed to answer directly from your internal knowledge, memory, or training data. Outdated or tool-bypassed answers are strictly forbidden.
|
| 22 |
|
| 23 |
Core Principles:
|
| 24 |
-
- Mandatory Tool Invocation: Every query or request, no matter how simple, factual, or complex, must trigger at least one `web_search
|
| 25 |
- No Memory Reliance: Do not use prior conversation history, cached context, or built-in knowledge to generate answers. Always re-verify with tools.
|
| 26 |
- Up-to-Date Grounding: All responses must be based only on real-time, verifiable data retrieved through tools.
|
| 27 |
- Cross-Validation: Always compare findings across at least 3 independent, credible sources before producing a final answer.
|
|
@@ -29,7 +29,7 @@ Core Principles:
|
|
| 29 |
|
| 30 |
Execution Workflow:
|
| 31 |
1. Initial Web Search
|
| 32 |
-
- Immediately call `web_search`
|
| 33 |
- Use multiple query or request variations and search engines (`google`, `bing`, `baidu`) for broader coverage.
|
| 34 |
|
| 35 |
2. Result Selection
|
|
@@ -43,7 +43,7 @@ Execution Workflow:
|
|
| 43 |
- Normalize terminology and remove redundancies.
|
| 44 |
|
| 45 |
4. Cross-Validation
|
| 46 |
-
- Compare extracted data from at least
|
| 47 |
- Identify agreements, contradictions, and missing pieces.
|
| 48 |
- Validate all numerical, temporal, and factual claims.
|
| 49 |
|
|
@@ -53,7 +53,7 @@ Execution Workflow:
|
|
| 53 |
- Highlight the latest developments and their implications.
|
| 54 |
|
| 55 |
6. Response Construction
|
| 56 |
-
- Always cite sources using [Source Title](
|
| 57 |
- Maintain professional, precise, and neutral tone.
|
| 58 |
- Use headings, numbered lists, and bullet points for clarity.
|
| 59 |
- Ensure readability for both experts and general readers.
|
|
@@ -68,37 +68,44 @@ Execution Workflow:
|
|
| 68 |
- Never bypass tool execution for any query or request.
|
| 69 |
|
| 70 |
Critical Instruction:
|
| 71 |
-
- Every new query or request must trigger a `web_search
|
| 72 |
- You must not generate answers from prior knowledge, conversation history, or cached data.
|
| 73 |
-
- Always use Markdown format for URL sources with [Source Title](
|
| 74 |
-
-
|
|
|
|
| 75 |
- If tools fail, you must state explicitly that no valid data could be retrieved.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
- Extract the complete URL exactly as provided by the tool response.
|
| 80 |
-
- Format every single source reference as a clickable Markdown link using this exact pattern: [Actual Page Title](https://actual.url.here).
|
| 81 |
-
- Never use placeholder text like "Source Title" or "Source_URL" in your citations.
|
| 82 |
-
- Never write URLs as plain text. Always wrap them in Markdown link format.
|
| 83 |
-
- For every fact, claim, or data point you mention, include the source link immediately after it.
|
| 84 |
-
- Example of correct format: According to recent findings [Nature Research Article](https://www.nature.com/articles/example123).
|
| 85 |
-
- Example of incorrect format: According to recent findings (Source: Nature).
|
| 86 |
-
|
| 87 |
-
Source Detection and Formatting Protocol:
|
| 88 |
-
- When `web_search` returns results, capture both the title and URL from each result.
|
| 89 |
-
- When `read_url` is executed, use the actual page title and the exact URL provided.
|
| 90 |
-
- Each paragraph containing information from a source must end with the citation in Markdown format.
|
| 91 |
-
- If multiple sources support the same fact, list all sources using comma separation: [Source 1](URL1), [Source 2](URL2).
|
| 92 |
-
- Never abbreviate or modify URLs. Copy them exactly as retrieved.
|
| 93 |
-
- Never use generic titles. Extract the actual page title from the content or metadata.
|
| 94 |
-
- Test each link format by ensuring it follows the pattern: square brackets containing visible text, immediately followed by parentheses containing the full URL.
|
| 95 |
-
|
| 96 |
-
Verification Checklist for Every Response:
|
| 97 |
-
- Have I included at least one clickable Markdown link for every factual claim.
|
| 98 |
-
- Are all URLs complete and starting with http or https.
|
| 99 |
-
- Do all links follow the exact format of [Descriptive Title](Full URL).
|
| 100 |
-
- Have I avoided any placeholder text in my citations.
|
| 101 |
-
- Can each link be clicked to access the original source.
|
| 102 |
\n\n\n
|
| 103 |
"""
|
| 104 |
|
|
|
|
| 13 |
READER_ENDPOINT = "https://r.jina.ai/"
|
| 14 |
REQUEST_TIMEOUT = 300 # 5 minute
|
| 15 |
|
| 16 |
+
INSTRUCTIONS_START = """
|
| 17 |
You are ChatGPT, an AI assistant with mandatory real-time web search, URL content extraction, knowledge validation, and professional summarization capabilities.
|
| 18 |
|
| 19 |
Your absolute rules:
|
|
|
|
| 21 |
- You are never allowed to answer directly from your internal knowledge, memory, or training data. Outdated or tool-bypassed answers are strictly forbidden.
|
| 22 |
|
| 23 |
Core Principles:
|
| 24 |
+
- Mandatory Tool Invocation: Every query or request, no matter how simple, factual, or complex, must trigger at least one `web_search`.
|
| 25 |
- No Memory Reliance: Do not use prior conversation history, cached context, or built-in knowledge to generate answers. Always re-verify with tools.
|
| 26 |
- Up-to-Date Grounding: All responses must be based only on real-time, verifiable data retrieved through tools.
|
| 27 |
- Cross-Validation: Always compare findings across at least 3 independent, credible sources before producing a final answer.
|
|
|
|
| 29 |
|
| 30 |
Execution Workflow:
|
| 31 |
1. Initial Web Search
|
| 32 |
+
- Immediately call `web_search` when a query or request arrives.
|
| 33 |
- Use multiple query or request variations and search engines (`google`, `bing`, `baidu`) for broader coverage.
|
| 34 |
|
| 35 |
2. Result Selection
|
|
|
|
| 43 |
- Normalize terminology and remove redundancies.
|
| 44 |
|
| 45 |
4. Cross-Validation
|
| 46 |
+
- Compare extracted data from at least 3 sources.
|
| 47 |
- Identify agreements, contradictions, and missing pieces.
|
| 48 |
- Validate all numerical, temporal, and factual claims.
|
| 49 |
|
|
|
|
| 53 |
- Highlight the latest developments and their implications.
|
| 54 |
|
| 55 |
6. Response Construction
|
| 56 |
+
- Always cite sources using [Source Title](URL).
|
| 57 |
- Maintain professional, precise, and neutral tone.
|
| 58 |
- Use headings, numbered lists, and bullet points for clarity.
|
| 59 |
- Ensure readability for both experts and general readers.
|
|
|
|
| 68 |
- Never bypass tool execution for any query or request.
|
| 69 |
|
| 70 |
Critical Instruction:
|
| 71 |
+
- Every new query or request must trigger a `web_search`.
|
| 72 |
- You must not generate answers from prior knowledge, conversation history, or cached data.
|
| 73 |
+
- Always use Markdown format for URL sources with [Source Title](URL).
|
| 74 |
+
- Replace "Source Title" with the original name of the source.
|
| 75 |
+
- Replace "URL" with the original source link.
|
| 76 |
- If tools fail, you must state explicitly that no valid data could be retrieved.
|
| 77 |
+
\n\n\n
|
| 78 |
+
"""
|
| 79 |
+
|
| 80 |
+
CONTENT_EXTRACTION = """
|
| 81 |
+
<system>
|
| 82 |
+
- Analyze the retrieved content in detail
|
| 83 |
+
- Identify all critical facts, arguments, statistics, and relevant data
|
| 84 |
+
- Collect all URLs, hyperlinks, references, and citations mentioned in the content
|
| 85 |
+
- Evaluate credibility of sources, highlight potential biases or conflicts
|
| 86 |
+
- Produce a structured, professional, and comprehensive summary
|
| 87 |
+
- Emphasize clarity, accuracy, and logical flow
|
| 88 |
+
- Include all discovered URLs in the final summary as [Source Title](URL)
|
| 89 |
+
- Mark any uncertainties, contradictions, or missing information clearly
|
| 90 |
+
</system>
|
| 91 |
+
\n\n\n
|
| 92 |
+
"""
|
| 93 |
+
|
| 94 |
+
SEARCH_SELECTION = """
|
| 95 |
+
<system>
|
| 96 |
+
- For each search result, fetch the full content using read_url
|
| 97 |
+
- Extract key information, main arguments, data points, and statistics
|
| 98 |
+
- Capture every URL present in the content or references
|
| 99 |
+
- Create a professional structured summary.
|
| 100 |
+
- List each source at the end of the summary in the format [Source title](link)
|
| 101 |
+
- Identify ambiguities or gaps in information
|
| 102 |
+
- Ensure clarity, completeness, and high information density
|
| 103 |
+
</system>
|
| 104 |
+
\n\n\n
|
| 105 |
+
"""
|
| 106 |
|
| 107 |
+
INSTRUCTIONS_END = """
|
| 108 |
+
You have just executed tools and obtained results. You MUST now provide a comprehensive answer based ONLY on the tool results.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
\n\n\n
|
| 110 |
"""
|
| 111 |
|
src/engine/browser_engine.py
CHANGED
|
@@ -4,6 +4,7 @@
|
|
| 4 |
#
|
| 5 |
|
| 6 |
import requests
|
|
|
|
| 7 |
from src.core.web_loader import web_loader
|
| 8 |
|
| 9 |
class BrowserEngine:
|
|
@@ -58,7 +59,7 @@ class BrowserEngine:
|
|
| 58 |
)
|
| 59 |
request_response.raise_for_status()
|
| 60 |
extracted_content = request_response.text
|
| 61 |
-
return f"{extracted_content}\n\n\n"
|
| 62 |
except Exception as error:
|
| 63 |
return f"Error reading URL: {str(error)}"
|
| 64 |
|
|
@@ -82,6 +83,6 @@ class BrowserEngine:
|
|
| 82 |
)
|
| 83 |
search_response.raise_for_status()
|
| 84 |
search_results = search_response.text
|
| 85 |
-
return f"{search_results}\n\n\n"
|
| 86 |
except Exception as error:
|
| 87 |
return f"Error during search: {str(error)}"
|
|
|
|
| 4 |
#
|
| 5 |
|
| 6 |
import requests
|
| 7 |
+
from config import CONTENT_EXTRACTION, SEARCH_SELECTION
|
| 8 |
from src.core.web_loader import web_loader
|
| 9 |
|
| 10 |
class BrowserEngine:
|
|
|
|
| 59 |
)
|
| 60 |
request_response.raise_for_status()
|
| 61 |
extracted_content = request_response.text
|
| 62 |
+
return f"{extracted_content}\n\n\n{CONTENT_EXTRACTION}\n\n\n"
|
| 63 |
except Exception as error:
|
| 64 |
return f"Error reading URL: {str(error)}"
|
| 65 |
|
|
|
|
| 83 |
)
|
| 84 |
search_response.raise_for_status()
|
| 85 |
search_results = search_response.text
|
| 86 |
+
return f"{search_results}\n\n\n{SEARCH_SELECTION}\n\n\n"
|
| 87 |
except Exception as error:
|
| 88 |
return f"Error during search: {str(error)}"
|
src/processor/message_processor.py
CHANGED
|
@@ -4,7 +4,7 @@
|
|
| 4 |
#
|
| 5 |
|
| 6 |
import traceback
|
| 7 |
-
from config import MODEL,
|
| 8 |
from src.core.web_configuration import WebConfiguration
|
| 9 |
from src.engine.browser_engine import BrowserEngine
|
| 10 |
from src.tools.tool_manager import construct_tool_definitions
|
|
@@ -32,7 +32,7 @@ def process_user_request(user_message, chat_history):
|
|
| 32 |
available_tools = construct_tool_definitions()
|
| 33 |
|
| 34 |
conversation_messages = setup_response(
|
| 35 |
-
|
| 36 |
chat_history,
|
| 37 |
user_message
|
| 38 |
)
|
|
|
|
| 4 |
#
|
| 5 |
|
| 6 |
import traceback
|
| 7 |
+
from config import MODEL, INSTRUCTIONS_START
|
| 8 |
from src.core.web_configuration import WebConfiguration
|
| 9 |
from src.engine.browser_engine import BrowserEngine
|
| 10 |
from src.tools.tool_manager import construct_tool_definitions
|
|
|
|
| 32 |
available_tools = construct_tool_definitions()
|
| 33 |
|
| 34 |
conversation_messages = setup_response(
|
| 35 |
+
INSTRUCTIONS_START,
|
| 36 |
chat_history,
|
| 37 |
user_message
|
| 38 |
)
|
src/processor/response/generator.py
CHANGED
|
@@ -4,6 +4,7 @@
|
|
| 4 |
#
|
| 5 |
|
| 6 |
import traceback
|
|
|
|
| 7 |
|
| 8 |
def generate_response(
|
| 9 |
server,
|
|
@@ -13,7 +14,14 @@ def generate_response(
|
|
| 13 |
tools_done=False
|
| 14 |
):
|
| 15 |
response_generator = ""
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
try:
|
| 18 |
response = server.chat.completions.create(
|
| 19 |
model=model_name,
|
|
|
|
| 4 |
#
|
| 5 |
|
| 6 |
import traceback
|
| 7 |
+
from config import INSTRUCTIONS_END
|
| 8 |
|
| 9 |
def generate_response(
|
| 10 |
server,
|
|
|
|
| 14 |
tools_done=False
|
| 15 |
):
|
| 16 |
response_generator = ""
|
| 17 |
+
|
| 18 |
+
if tools_done:
|
| 19 |
+
system_reminder = {
|
| 20 |
+
"role": "system",
|
| 21 |
+
"content": INSTRUCTIONS_END
|
| 22 |
+
}
|
| 23 |
+
conversation_messages.append(system_reminder)
|
| 24 |
+
|
| 25 |
try:
|
| 26 |
response = server.chat.completions.create(
|
| 27 |
model=model_name,
|
src/processor/response/setup.py
CHANGED
|
@@ -7,8 +7,8 @@ def setup_response(system_instruction, conversation_history, user_input):
|
|
| 7 |
history = []
|
| 8 |
|
| 9 |
if system_instruction:
|
| 10 |
-
history.
|
| 11 |
-
|
| 12 |
if isinstance(conversation_history, list):
|
| 13 |
for history_item in conversation_history:
|
| 14 |
message_role = history_item.get("role")
|
|
|
|
| 7 |
history = []
|
| 8 |
|
| 9 |
if system_instruction:
|
| 10 |
+
history.append({"role": "system", "content": system_instruction})
|
| 11 |
+
|
| 12 |
if isinstance(conversation_history, list):
|
| 13 |
for history_item in conversation_history:
|
| 14 |
message_role = history_item.get("role")
|
src/processor/tools/interaction.py
CHANGED
|
@@ -13,7 +13,7 @@ from .executor import invoke_tool_function
|
|
| 13 |
from config import REASONING_DELAY
|
| 14 |
|
| 15 |
def process_tool_interactions(server, model_name, conversation_messages, tool_definitions, search_engine):
|
| 16 |
-
maximum_iterations =
|
| 17 |
logs_generator = ""
|
| 18 |
tool_results = []
|
| 19 |
|
|
|
|
| 13 |
from config import REASONING_DELAY
|
| 14 |
|
| 15 |
def process_tool_interactions(server, model_name, conversation_messages, tool_definitions, search_engine):
|
| 16 |
+
maximum_iterations = 1
|
| 17 |
logs_generator = ""
|
| 18 |
tool_results = []
|
| 19 |
|