Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

Abstract

Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.

Fractional Reasoning: Framework

Fractional Reasoning is a training-free, model-agnostic framework that enables continuous control over reasoning intensity at inference time.
It computes a latent steering vector h_steer that captures prompt-induced shifts in the LLM's internal representations. This vector is derived by contrasting representations from positive and negative prompts across multiple layers.
At inference, this vector is applied with a tunable scalar α to modulate reasoning strength:
h̃ = Rescale(h_ori + α · h_steer),
where Rescale preserves the norm of the latent state to ensure stability across layers.
Fractional Reasoning enables instance-specific steering of reasoning depth and reflection strength, improving output quality without modifying the underlying model.

Fractional Reasoning: Performance

Fractional Reasoning is a training-free and model-agnostic framework for improving test-time compute.
Fractional Reasoning can adaptively control the reasoning behavior in LLMs by identifying and reapplying reasoning-induced latent shifts with a tunable scaling factor.
Fractional Reasoning enables continuous adjustment of both reasoning depth and reflection strength—tailoring inference-time behavior to the demands of each input.

Figure 2. Averaged accuracy across MATH500, GSM8K, and GPQA. Blue bars represent standard test-time scaling methods, purple bars show these methods enhanced by our Fractional Reasoning.

Thinking

Table 1. The performance of our proposed Fractional Reasoning and other common test-time scaling methods on different reasoning benchmarks is presented.

Reflection

Table 2. The performance of our proposed Fractional Reasoning and other common test-time scaling methods on reflection.

Fractional Reasoning: Analysis

Fractional Reasoning can work on sentence level
Fractional Reasoning + majority vote works better than majority vote across all number of generations.

Figure 3. Sentence-level control dynamically adjusts reflection strength α at each generation step,enabling correction of errors missed by instance-level control.

Figure 4. Accuracy on GSM8K and GPQA as a function of the number of generations.

BibTeX


@article{liu2025fractional,
  title={Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute},
  author={Liu, Sheng and Chen, Tianlang and Lu, Pan and Ye, Haotian and Chen, Yizheng and Xing, Lei and Zou, James},
  journal={arXiv preprint arXiv:2506.15882},
  year={2025}
}

Acknowledgement

This website is adapted from Nerfies and X-Decoder, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Related Links :

Fractional Reasoning

via Latent Steering Vectors Improves Inference Time Compute

Figure 1. Illustration of how model behavior changes with the scale of instructional strength controlling the "fraction" of reasoning, applied to both Chain-of-Thought and Reflection prompting.