Efficient ReasoningL1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning让推理模型能精确地自适应地控制思维链长度,优化目标: 1. 最终输出的准确率 2. 生成符合提示词中具体长度限制的推理序列 Make Long CoT Short RL-based Methods 推荐 Reasoning
Efficient ReasoningWhen More is Less: Understanding Chain-of-Thought Length in LLMs该研究挑战了“Chain-of-Thought (CoT)越长效果越好”的普遍看法。 Efficient Reasoning 论文 推荐 文字 Background
Efficient ReasoningO1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning RL-based Methods 推荐 Efficient Reasoning