Improve model card: Add pipeline tag, library name, RAG tag, and sample usage

1f524fe verified 4 months ago

3.25 kB

	---
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	language:
	- en
	- zh
	license: mit
	pipeline_tag: question-answering
	library_name: transformers
	tags:
	- biology
	- finance
	- text-generation-inference
	- retrieval-augmented-generation
	---

	# HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

	## Model Information

	We release the agent model used in HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches.

	<p align="left">
	Useful links: 📝 <a href="https://arxiv.org/abs/2508.08088" target="_blank">Paper (arXiv)</a> • 🤗 <a href="https://huggingface.co/papers/2508.08088" target="_blank">Paper (Hugging Face)</a> • 🧩 <a href="https://github.com/plageon/HierSearch" target="_blank">Github Repository</a>
	</p>

	1. We explore the deep search framework in multi-knowledge-source scenarios and propose a hierarchical agentic paradigm and train with HRL;
	2. We notice drawbacks of the naive information transmission among deep search agents and developed a knowledge refiner suitable for multi-knowledge-source scenarios;
	3. Our proposed approach for reliable and effective deep search across multiple knowledge sources outperforms existing baselines the flat-RL solution in various domains.


	🌹 If you use this model, please ✨star our [GitHub repository](https://github.com/plageon/HierSearch) or upvote our [paper](https://huggingface.co/papers/2508.08088) to support us. Your star means a lot!

	## Usage

	This model is designed as a "planner agent" within the HierSearch framework, coordinating local and web searches to answer complex questions. It is based on `Qwen2.5-7B-Instruct`. You can load and use it with the `transformers` library for general text generation, or refer to the full codebase for the complete deep search functionality.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "plageon/HierSearch-Planner-Agent"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")

	messages = [
	{"role": "system", "content": "You are a helpful and knowledgeable assistant specializing in enterprise search."},
	{"role": "user", "content": "What are the main findings of the paper 'HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches'?"}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512)
	decoded_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

	print(decoded_output)
	```

	## Citation

	```bibtex
	@misc{tan2025hiersearchhierarchicalenterprisedeep,
	title={HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches},
	author={Jiejun Tan and Zhicheng Dou and Yan Yu and Jiehan Cheng and Qiang Ju and Jian Xie and Ji-Rong Wen},
	year={2025},
	eprint={2508.08088},
	archivePrefix={arXiv},
	primaryClass={cs.IR},
	url={https://arxiv.org/abs/2508.08088},
	}
	```