Author here. Happy to answer questions on methodology. The finding that surprised us most: 69% of behavioral divergence happens at step 2 — the first tool call — not in later reasoning steps. Improving query formulation at that single point has more impact than optimizing any later step.
Author here. Happy to answer questions on methodology. The finding that surprised us most: 69% of behavioral divergence happens at step 2 — the first tool call — not in later reasoning steps. Improving query formulation at that single point has more impact than optimizing any later step.