Post
1251
Introducing ToolsGen π οΈ
I built a tool to solve a problem I kept running into: creating quality datasets for training LLMs to use tools.
ToolsGen takes your JSON tool definitions and automatically generates realistic user requests, corresponding tool calls, and evaluates them using an LLM-as-a-judge pipeline. It outputs datasets ready to use with Hugging Face.
What makes it useful:
- Generates realistic user requests + tool calls from JSON definitions
- LLM-as-a-judge quality scoring with multi-dimensional rubrics
- Multiple sampling strategies (random, parameter-aware, semantic)
- OpenAI-compatible API support
- Outputs JSONL with train/val splits
Still early days (API isn't stable yet), but it's already helping me generate tool-calling datasets much faster.
Check it out: https://github.com/atasoglu/toolsgen
Happy to hear feedback or ideas!
I built a tool to solve a problem I kept running into: creating quality datasets for training LLMs to use tools.
ToolsGen takes your JSON tool definitions and automatically generates realistic user requests, corresponding tool calls, and evaluates them using an LLM-as-a-judge pipeline. It outputs datasets ready to use with Hugging Face.
What makes it useful:
- Generates realistic user requests + tool calls from JSON definitions
- LLM-as-a-judge quality scoring with multi-dimensional rubrics
- Multiple sampling strategies (random, parameter-aware, semantic)
- OpenAI-compatible API support
- Outputs JSONL with train/val splits
Still early days (API isn't stable yet), but it's already helping me generate tool-calling datasets much faster.
Check it out: https://github.com/atasoglu/toolsgen
Happy to hear feedback or ideas!