Entity

Tool-Aware Optimization with Entropy Guidance for Efficient Agentic Reinforcement Learning

Agentic reinforcement learning (RL) equips large language models (LLMs) with tool-use capabilities that substantially improve reasoning on complex tasks. However, integrating external tools often destabilizes training: over-reliance on tools can induce input distribution shift, while overly conservative tool use limits effective exploration. To address this issue, we propose a unified framework TAO-RL that couples tool-aware trajectory filtering with entropy-guided exploration for efficient poli

Paper · arXiv

cs.LG

Authors: Hongye Cao, Nuo Yan, Haoyuan Deng, Ziwei Wang, Tianpei Yang + 3 more
Published: 2026-06-02
Categories: cs.LGcs.AI

Abstract ↗

via arXiv · 2606.03762