Efficient ReasoningDAST: Difficulty-Adaptive Slow Thinking for Large Reasoning Models提出了一个名为DAST的框架,它能让模型根据问题的难度自动调整推理步骤的长短,从而在不牺牲复杂任务准确性的前提下,显著提升推理效率。 RL-based Methods 推荐 Efficient Reasoning Make Long CoT Short
Efficient ReasoningL1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning让推理模型能精确地自适应地控制思维链长度,优化目标: 1. 最终输出的准确率 2. 生成符合提示词中具体长度限制的推理序列 Make Long CoT Short RL-based Methods 推荐 Reasoning
Efficient ReasoningO1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning提出了一种名为O1-Pruner的微调方法,它解决了长思辨模型因推理冗长而效率低下的问题,成功地在大幅提升模型推理速度的同时,还保持乃至提升了其准确率。 RL-based Methods 推荐 Efficient Reasoning Make Long CoT Short