ZhuofengLi commited on
Commit
7e45aa7
·
verified ·
1 Parent(s): 715a509

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -174
README.md CHANGED
@@ -1,176 +1,13 @@
1
  ---
2
- library_name: transformers
3
- pipeline_tag: text-generation
4
- license: apache-2.0
5
- language: en
6
- base_model: Qwen/Qwen2.5-3B-Instruct
7
- tags:
8
- - llm
9
- - agent
10
- - tool-use
11
- - planning
12
- - qwen2
13
- - reinforcement-learning
14
  ---
15
-
16
- <p align="center">
17
- <picture>
18
- <source media="(prefers-color-scheme: dark)" srcset="https://github.com/lupantech/AgentFlow/raw/main/assets/img/logo.png">
19
- <img alt="AgentFlow" src="https://github.com/lupantech/AgentFlow/raw/main/assets/img/logo.png" width=31%>
20
- </picture>
21
- </p>
22
-
23
- <h3 align="center">
24
- AgentFlow: In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
25
- </h3>
26
-
27
- <!--- BADGES: START --->
28
- <p align="center">
29
- <a href="https://arxiv.org/abs/2510.05592"><img src="https://img.shields.io/badge/arXiv-2510.05592-B31B1B.svg?logo=arxiv" alt="Arxiv"></a>
30
- <a href="https://huggingface.co/spaces/AgentFlow/agentflow"><img src="https://img.shields.io/badge/Gradio-Demo-F97316.svg?logo=gradio" alt="Gradio Demo"></a>
31
- <a href="https://huggingface.co/papers/2510.05592"><img src="https://img.shields.io/badge/Huggingface-Paper-FFD21E.svg?logo=huggingface" alt="Huggingface Paper"></a>
32
- <a href="https://huggingface.co/AgentFlow"><img src="https://img.shields.io/badge/Huggingface-Model-FFD21E.svg?logo=huggingface" alt="Huggingface Model"></a>
33
- <a href="https://agentflow.stanford.edu/"><img src="https://img.shields.io/badge/Website-AgentFlow-E5426E?logo=kashflow" alt="Website"></a>
34
- </p>
35
- <!--- BADGES: END --->
36
-
37
- ## Model Details
38
-
39
- ### Model Description
40
-
41
- AgentFlow is a **trainable, in-the-flow agentic framework** that coordinates four specialized modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. This system addresses the limitations of prevailing tool-augmented approaches that often scale poorly with long horizons and diverse tools, and generalize weakly to new scenarios.
42
-
43
- To enable effective planning and tool use, AgentFlow introduces **Flow-based Group Refined Policy Optimization (Flow-GRPO)**, a novel algorithm that tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. The model leverages a **Qwen-2.5-3B-Instruct backbone**.
44
-
45
- - **Developed by:** Zhuofeng Li, Haoxiang Zhang, Pan Lu, and others.
46
- - **Model type:** Large Language Model with Agentic Capabilities
47
- - **Language(s) (NLP):** English
48
- - **License:** Apache-2.0
49
- - **Finetuned from model:** Qwen/Qwen2.5-3B-Instruct
50
-
51
- ### Model Sources
52
-
53
- - **Repository:** https://github.com/lupantech/AgentFlow
54
- - **Paper:** https://huggingface.co/papers/2510.05592
55
- - **Project Page:** https://agentflow.stanford.edu/
56
- - **Demo:** https://huggingface.co/spaces/AgentFlow/agentflow
57
-
58
- ## Uses
59
-
60
- ### Direct Use
61
-
62
- AgentFlow is intended for researchers and developers working on advanced AI agents and large language models that require dynamic planning and effective utilization of external tools. It is particularly suitable for:
63
- * Complex reasoning tasks that demand multi-turn interaction and robust credit assignment.
64
- * Developing systems capable of autonomous skill discovery and practice in live environments.
65
- * Benchmarking and advancing the state-of-the-art in agentic LLM research.
66
-
67
- ### Out-of-Scope Use
68
-
69
- The model is not intended for:
70
- * Deployment in high-stakes, safety-critical applications without extensive additional fine-tuning, validation, and human oversight.
71
- * Generating content that is harmful, unethical, or violates privacy.
72
- * Tasks outside the scope of text-based reasoning and tool use without further adaptation or integration with other modalities.
73
-
74
- ## Bias, Risks, and Limitations
75
-
76
- AgentFlow, like other large language models, may exhibit biases present in its training data or the tools it integrates. Potential risks and limitations include:
77
- * **Hallucination:** The model might generate factually incorrect or nonsensical outputs, especially in complex scenarios or when tool outputs are ambiguous.
78
- * **Tool Misuse/Over-reliance:** Incorrectly invoking tools, misinterpreting tool outputs, or failing to identify appropriate tools for a given task.
79
- * **Generalization Gaps:** While designed for generalization, performance might degrade on tasks significantly different from its training distribution.
80
- * **Long-horizon Challenges:** Although designed to address long horizons, extremely long and complex tasks may still pose challenges for effective planning and execution.
81
- * **API Key Dependency:** The system's functionality heavily relies on external API keys (e.g., Google, OpenAI, DashScope), which might incur costs or introduce external dependencies.
82
-
83
- ### Recommendations
84
-
85
- Users of AgentFlow should:
86
- * Be aware of the potential for biases and hallucinations inherited from the underlying LLM and training data.
87
- * Carefully validate outputs, especially for critical applications.
88
- * Thoroughly test the system's behavior in specific deployment contexts.
89
- * Refer to the [AgentFlow GitHub repository](https://github.com/lupantech/AgentFlow) for detailed setup, configuration, and best practices to mitigate risks.
90
-
91
- ## How to Get Started with the Model
92
-
93
- AgentFlow provides a modular agentic system with **four specialized modules** (planner, executor, verifier, generator) that coordinate through **evolving memory** and a **toolkit** over **multiple turns** to solve complex reasoning tasks.
94
-
95
- To quickly experience the system in action, follow the installation and environment setup instructions on the [AgentFlow GitHub repository](https://github.com/lupantech/AgentFlow). Once your environment variables and API keys are configured, you can use the following Python code snippet for inference:
96
-
97
- ```python
98
- # Import the solver
99
- from agentflow.agentflow.solver import construct_solver
100
-
101
- # Set the LLM engine name (e.g., "dashscope" or "together")
102
- llm_engine_name = "dashscope"
103
-
104
- # Construct the solver
105
- solver = construct_solver(llm_engine_name=llm_engine_name)
106
-
107
- # Solve the user query
108
- output = solver.solve("What is the capital of France?")
109
- print(output["direct_output"])
110
- ```
111
-
112
- ## Training Details
113
-
114
- ### Training Data
115
-
116
- AgentFlow is trained on a mixed dataset for diverse reasoning tasks:
117
- * **NQ (Natural Questions)**: Used for agentic search tasks. (Link: [https://huggingface.co/datasets/RUC-NLPIR/FlashRAG_datasets](https://huggingface.co/datasets/RUC-NLPIR/FlashRAG_datasets))
118
- * **DeepMath-103K**: Used for mathematical reasoning tasks. (Link: [https://huggingface.co/datasets/zwhe99/DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K))
119
-
120
- Detailed scripts for dataset preparation (`get_train_data.py`, `aime24_data.py`) are available in the [GitHub repository](https://github.com/lupantech/AgentFlow/tree/main/data).
121
-
122
- ### Training Procedure
123
-
124
- AgentFlow employs **Flow-based Group Refined Policy Optimization (Flow-GRPO)**, which directly optimizes the planner agent within the multi-turn interaction loop in an online fashion. This method converts multi-turn optimization into a sequence of tractable single-turn policy updates.
125
-
126
- #### Training Hyperparameters
127
-
128
- All training hyperparameters (model settings, tools, RL parameters, resources) are configurable via `train/config.yaml` in the GitHub repository.
129
-
130
- ## Evaluation
131
-
132
- ### Testing Data, Factors & Metrics
133
-
134
- #### Testing Data
135
-
136
- AgentFlow was evaluated across ten benchmarks covering various domains:
137
- * Search tasks
138
- * Agentic reasoning tasks
139
- * Mathematical tasks
140
- * Scientific tasks
141
-
142
- #### Metrics
143
-
144
- The primary metric for evaluation is **accuracy**.
145
-
146
- ### Results
147
-
148
- AgentFlow, utilizing a 3B-scale backbone (Qwen-2.5-3B-Instruct), demonstrates competitive performance across multiple benchmarks. While the 3B version offers improved efficiency and faster inference compared to larger models, it maintains strong capabilities in:
149
- - Search tasks
150
- - Agentic reasoning tasks
151
- - Mathematical reasoning
152
- - Scientific problem solving
153
-
154
- The 3B model provides an excellent balance between performance and computational efficiency, making it suitable for resource-constrained environments or applications requiring faster response times.
155
-
156
- For a more in-depth understanding of the evaluation protocols and detailed results, please refer to the [paper](https://huggingface.co/papers/2510.05592) and the [project page](https://agentflow.stanford.edu/).
157
-
158
- ## Acknowledgements
159
-
160
- We thank the following open-source projects:
161
- - [verl](https://github.com/volcengine/verl) for the excellent RL framework design.
162
- - [VLLM](https://github.com/vllm-project/vllm) for fast LLM inference support.
163
- - [Ver-Tool](https://github.com/TIGER-AI-Lab/verl-tool) and [agent-lightning](https://github.com/microsoft/agent-lightning) for their early-stage exploration in agentic RL Training.
164
-
165
- We thank [Lambda](https://lambda.ai/careers) for GPU support!
166
-
167
- ## Citation
168
-
169
- ```bibtex
170
- @article{li2025flow,
171
- title = {In-the-Flow Agentic System Optimization for Effective Planning and Tool Use},
172
- author = {Li, Zhuofeng and Zhang, Haoxiang and Han, Seungju and Liu, Sheng and Xie, Jianwen and Zhang, Yu and Choi, Yejin and Zou, James and Lu, Pan},
173
- journal = {arXiv preprint arXiv:2510.05592},
174
- year = {2025}
175
- }
176
- ```
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-3B-Instruct
4
+ license: mit
 
 
 
 
 
 
 
 
 
5
  ---
6
+ AgentFlow Planner Agent 3B checkpoint (built upon Qwen2.5-3B-Instruct):
7
+
8
+ - HF Paper: https://huggingface.co/papers/date/2025-10-08
9
+ - Code: https://github.com/lupantech/AgentFlow
10
+ - Demo: https://huggingface.co/spaces/AgentFlow/agentflow
11
+ - Website: https://agentflow.stanford.edu/
12
+ - Youtube: https://www.youtube.com/watch?v=kIQbCQIH1SI
13
+ - X (Twitter): https://x.com/lupantech/status/1976016000345919803