Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
hsteer
that captures prompt-induced shifts in the LLM's internal representations. This vector is derived by contrasting representations from positive and negative prompts across multiple layers.α
to modulate reasoning strength:
h̃ = Rescale(hori + α · hsteer)
,
Rescale
preserves the norm of the latent state to ensure stability across layers.
Figure 2. Averaged accuracy across MATH500, GSM8K, and GPQA. Blue bars represent standard test-time scaling methods, purple bars show these methods enhanced by our Fractional Reasoning.
ThinkingTable 1. The performance of our proposed Fractional Reasoning and other common test-time scaling methods on different reasoning benchmarks is presented.
ReflectionTable 2. The performance of our proposed Fractional Reasoning and other common test-time scaling methods on reflection.
Figure 3. Sentence-level control dynamically adjusts reflection strength α at each generation step,enabling correction of errors missed by instance-level control.
Figure 4. Accuracy on GSM8K and GPQA as a function of the number of generations.
@article{liu2025fractional,
author = {Liu, Sheng and Chen, Tianlang and Lu, Pan and Ye, Haotian and Chen, Yizheng and Xing, Lei and Zou, James},
title = {Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute},
publisher = {arXiv:2301.07093},
year = {2025},
}
This website is adapted from Nerfies and X-Decoder, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Related Links :