Synthetic data generation for multi-turn tool calling

Synthetic data generation has proven to be a valuable resource for tuning smaller, cost-effective language models to handle the complexities of multi-turn tool calling conversations. A new study introduces DiGiT-TC, a data generation method designed to produce tool calling conversations with characteristics similar to those generated in stateful environments.

Stateful vs. stateless environments

Many existing frameworks assume that tool calling interactions take place in an execution environment that maintains state. This approach facilitates the validation of interactions by verifying whether the state of the execution environment matches a predefined objective. However, in real-world contexts such as enterprise environments with stringent data security requirements or when tool specifications are synthesized from multiple sources, this approach is not always applicable.

DiGiT-TC: a new approach

DiGiT-TC addresses this problem through a generation pattern that allows certain tool calls to be implicitly represented in the user request. Validation on standard tool calling benchmarks demonstrates that this approach leads to significant performance gains, even in stateful contexts.