让推理模型能精确地自适应地控制思维链长度，优化目标：
1. 最终输出的准确率
2. 生成符合提示词中具体长度限制的推理序列

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

该研究挑战了“Chain-of-Thought (CoT)越长效果越好”的普遍看法。

When More is Less: Understanding Chain-of-Thought Length in LLMs

提出了一种名为O1-Pruner的微调方法，它解决了长思辨模型因推理冗长而效率低下的问题，成功地在大幅提升模型推理速度的同时，还保持乃至提升了其准确率。

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

FoodSeg

绘制自然语言处理前沿图谱：ACL 2025 主要会议论文专题分析

ACL2025

LlamaFactory Eval过程解析、在mmlu任务上评估

LlamaFactory

Research Proposal for MATRL

Matplotlib

网络文本情感计算

A Holistic Lexicon-Based Approach to Opinion Mining

ACL2025 SLM

该文章提出了一种高效的方法，通过带回溯的二分剪裁算法来修剪冗余的思考步骤，并开创性地让小模型自己（On-policy）来验证数据的有效性，从而为其量身定制出简洁且易于学习的推理样本，解决了长链思考能力难以被有效蒸馏的问题。

Efficient Long CoT Reasoning in Small Language Models

Efficient Reasoning 

技术分享

学习笔记

研究调研

课程资料

Efficient Reasoning

Github

CSDN

菜单slug留空或填#即可，用于下面的子菜单

往期整理

历史归档

文章分类

文章标签

关于我

友链

新闻

论文

Reasoning

Background

RL-based Methods

Make Long CoT Short

文字

MARL

思考

description

website

contact email

phone number

difficulty level

cuisine type

preparation time

serving size

image

published date

password

icon

date

type

slug

status

title

summary

Post Gallery

Config

Post Board

Table

类型为Notice的文章将被显示为公告

公告

AI Infra

How to use it to evaluate models on other benchmarks?

Yummytanmo

课程

工具

开发

Python

参数

data_args

evaluation_args

finetuning_args

generating_args

model_args

training_args

Evaluate

eval过程

Load dataset

Formatting benchmark

Predict and test

How to use it to evaluate models on other benchmarks?

Wenxuan Wang

微信公众号

关注微信公众号了解更多