Efficient ReasoningL1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning让推理模型能精确地自适应地控制思维链长度,优化目标: 1. 最终输出的准确率 2. 生成符合提示词中具体长度限制的推理序列 Make Long CoT Short RL-based Methods 推荐 Reasoning