AgentFlow
/

agentflow-planner-3b

Safetensors

qwen2

Model card Files Files and versions

xet

Community

ZhuofengLi commited on Oct 12

Commit

7e45aa7

verified ·

1 Parent(s): 715a509

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -174

README.md CHANGED Viewed

@@ -1,176 +1,13 @@
 ---
-library_name: transformers
-pipeline_tag: text-generation
-license: apache-2.0
-language: en
-base_model: Qwen/Qwen2.5-3B-Instruct
-tags:
-- llm
-- agent
-- tool-use
-- planning
-- qwen2
-- reinforcement-learning
 ---
-<p align="center">
-  <picture>
-    <source media="(prefers-color-scheme: dark)" srcset="https://github.com/lupantech/AgentFlow/raw/main/assets/img/logo.png">
-    <img alt="AgentFlow" src="https://github.com/lupantech/AgentFlow/raw/main/assets/img/logo.png" width=31%>
-  </picture>
-</p>
-<h3 align="center">
-  AgentFlow: In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
-</h3>
-<!--- BADGES: START --->
-<p align="center">
-  <a href="https://arxiv.org/abs/2510.05592"><img src="https://img.shields.io/badge/arXiv-2510.05592-B31B1B.svg?logo=arxiv" alt="Arxiv"></a>
-  <a href="https://huggingface.co/spaces/AgentFlow/agentflow"><img src="https://img.shields.io/badge/Gradio-Demo-F97316.svg?logo=gradio" alt="Gradio Demo"></a>
-  <a href="https://huggingface.co/papers/2510.05592"><img src="https://img.shields.io/badge/Huggingface-Paper-FFD21E.svg?logo=huggingface" alt="Huggingface Paper"></a>
-  <a href="https://huggingface.co/AgentFlow"><img src="https://img.shields.io/badge/Huggingface-Model-FFD21E.svg?logo=huggingface" alt="Huggingface Model"></a>
-  <a href="https://agentflow.stanford.edu/"><img src="https://img.shields.io/badge/Website-AgentFlow-E5426E?logo=kashflow" alt="Website"></a>
-</p>
-<!--- BADGES: END --->
-## Model Details
-### Model Description
-AgentFlow is a **trainable, in-the-flow agentic framework** that coordinates four specialized modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. This system addresses the limitations of prevailing tool-augmented approaches that often scale poorly with long horizons and diverse tools, and generalize weakly to new scenarios.
-To enable effective planning and tool use, AgentFlow introduces **Flow-based Group Refined Policy Optimization (Flow-GRPO)**, a novel algorithm that tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. The model leverages a **Qwen-2.5-3B-Instruct backbone**.
-- **Developed by:** Zhuofeng Li, Haoxiang Zhang, Pan Lu, and others.
-- **Model type:** Large Language Model with Agentic Capabilities
-- **Language(s) (NLP):** English
-- **License:** Apache-2.0
-- **Finetuned from model:** Qwen/Qwen2.5-3B-Instruct
-### Model Sources
-- **Repository:** https://github.com/lupantech/AgentFlow
-- **Paper:** https://huggingface.co/papers/2510.05592
-- **Project Page:** https://agentflow.stanford.edu/
-- **Demo:** https://huggingface.co/spaces/AgentFlow/agentflow
-## Uses
-### Direct Use
-AgentFlow is intended for researchers and developers working on advanced AI agents and large language models that require dynamic planning and effective utilization of external tools. It is particularly suitable for:
-* Complex reasoning tasks that demand multi-turn interaction and robust credit assignment.
-* Developing systems capable of autonomous skill discovery and practice in live environments.
-* Benchmarking and advancing the state-of-the-art in agentic LLM research.
-### Out-of-Scope Use
-The model is not intended for:
-* Deployment in high-stakes, safety-critical applications without extensive additional fine-tuning, validation, and human oversight.
-* Generating content that is harmful, unethical, or violates privacy.
-* Tasks outside the scope of text-based reasoning and tool use without further adaptation or integration with other modalities.
-## Bias, Risks, and Limitations
-AgentFlow, like other large language models, may exhibit biases present in its training data or the tools it integrates. Potential risks and limitations include:
-* **Hallucination:** The model might generate factually incorrect or nonsensical outputs, especially in complex scenarios or when tool outputs are ambiguous.
-* **Tool Misuse/Over-reliance:** Incorrectly invoking tools, misinterpreting tool outputs, or failing to identify appropriate tools for a given task.
-* **Generalization Gaps:** While designed for generalization, performance might degrade on tasks significantly different from its training distribution.
-* **Long-horizon Challenges:** Although designed to address long horizons, extremely long and complex tasks may still pose challenges for effective planning and execution.
-* **API Key Dependency:** The system's functionality heavily relies on external API keys (e.g., Google, OpenAI, DashScope), which might incur costs or introduce external dependencies.
-### Recommendations
-Users of AgentFlow should:
-* Be aware of the potential for biases and hallucinations inherited from the underlying LLM and training data.
-* Carefully validate outputs, especially for critical applications.
-* Thoroughly test the system's behavior in specific deployment contexts.
-* Refer to the [AgentFlow GitHub repository](https://github.com/lupantech/AgentFlow) for detailed setup, configuration, and best practices to mitigate risks.
-## How to Get Started with the Model
-AgentFlow provides a modular agentic system with **four specialized modules** (planner, executor, verifier, generator) that coordinate through **evolving memory** and a **toolkit** over **multiple turns** to solve complex reasoning tasks.
-To quickly experience the system in action, follow the installation and environment setup instructions on the [AgentFlow GitHub repository](https://github.com/lupantech/AgentFlow). Once your environment variables and API keys are configured, you can use the following Python code snippet for inference:
-```python
-# Import the solver
-from agentflow.agentflow.solver import construct_solver
-# Set the LLM engine name (e.g., "dashscope" or "together")
-llm_engine_name = "dashscope"
-# Construct the solver
-solver = construct_solver(llm_engine_name=llm_engine_name)
-# Solve the user query
-output = solver.solve("What is the capital of France?")
-print(output["direct_output"])
-```
-## Training Details
-### Training Data
-AgentFlow is trained on a mixed dataset for diverse reasoning tasks:
-* **NQ (Natural Questions)**: Used for agentic search tasks. (Link: [https://huggingface.co/datasets/RUC-NLPIR/FlashRAG_datasets](https://huggingface.co/datasets/RUC-NLPIR/FlashRAG_datasets))
-* **DeepMath-103K**: Used for mathematical reasoning tasks. (Link: [https://huggingface.co/datasets/zwhe99/DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K))
-Detailed scripts for dataset preparation (`get_train_data.py`, `aime24_data.py`) are available in the [GitHub repository](https://github.com/lupantech/AgentFlow/tree/main/data).
-### Training Procedure
-AgentFlow employs **Flow-based Group Refined Policy Optimization (Flow-GRPO)**, which directly optimizes the planner agent within the multi-turn interaction loop in an online fashion. This method converts multi-turn optimization into a sequence of tractable single-turn policy updates.
-#### Training Hyperparameters
-All training hyperparameters (model settings, tools, RL parameters, resources) are configurable via `train/config.yaml` in the GitHub repository.
-## Evaluation
-### Testing Data, Factors & Metrics
-#### Testing Data
-AgentFlow was evaluated across ten benchmarks covering various domains:
-* Search tasks
-* Agentic reasoning tasks
-* Mathematical tasks
-* Scientific tasks
-#### Metrics
-The primary metric for evaluation is **accuracy**.
-### Results
-AgentFlow, utilizing a 3B-scale backbone (Qwen-2.5-3B-Instruct), demonstrates competitive performance across multiple benchmarks. While the 3B version offers improved efficiency and faster inference compared to larger models, it maintains strong capabilities in:
-- Search tasks
-- Agentic reasoning tasks
-- Mathematical reasoning
-- Scientific problem solving
-The 3B model provides an excellent balance between performance and computational efficiency, making it suitable for resource-constrained environments or applications requiring faster response times.
-For a more in-depth understanding of the evaluation protocols and detailed results, please refer to the [paper](https://huggingface.co/papers/2510.05592) and the [project page](https://agentflow.stanford.edu/).
-## Acknowledgements
-We thank the following open-source projects:
-- [verl](https://github.com/volcengine/verl) for the excellent RL framework design.
-- [VLLM](https://github.com/vllm-project/vllm) for fast LLM inference support.
-- [Ver-Tool](https://github.com/TIGER-AI-Lab/verl-tool) and [agent-lightning](https://github.com/microsoft/agent-lightning) for their early-stage exploration in agentic RL Training.
-We thank [Lambda](https://lambda.ai/careers) for GPU support!
-## Citation
-```bibtex
-@article{li2025flow,
-  title = {In-the-Flow Agentic System Optimization for Effective Planning and Tool Use},
-  author = {Li, Zhuofeng and Zhang, Haoxiang and Han, Seungju and Liu, Sheng and Xie, Jianwen and Zhang, Yu and Choi, Yejin and Zou, James and Lu, Pan},
-  journal = {arXiv preprint arXiv:2510.05592},
-  year = {2025}
-}
-```

 ---
+base_model:
+- Qwen/Qwen2.5-3B-Instruct
+license: mit
 ---
+AgentFlow Planner Agent 3B checkpoint (built upon Qwen2.5-3B-Instruct):
+- HF Paper: https://huggingface.co/papers/date/2025-10-08
+- Code: https://github.com/lupantech/AgentFlow
+- Demo: https://huggingface.co/spaces/AgentFlow/agentflow
+- Website: https://agentflow.stanford.edu/
+- Youtube: https://www.youtube.com/watch?v=kIQbCQIH1SI
+- X (Twitter): https://x.com/lupantech/status/1976016000345919803