Project Icon Fractional Reasoning

via Latent Steering Vectors Improves Inference Time Compute


* Equal Technical Contribution

Stanford University

Figure 1. Illustration of how model behavior changes with the scale of instructional strength controlling the "fraction" of reasoning, applied to both Chain-of-Thought and Reflection prompting.

Abstract

Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.

Fractional Reasoning: Framework

  • Fractional Reasoning is a training-free, model-agnostic framework that enables continuous control over reasoning intensity at inference time.
  • It computes a latent steering vector hsteer that captures prompt-induced shifts in the LLM's internal representations. This vector is derived by contrasting representations from positive and negative prompts across multiple layers.
  • At inference, this vector is applied with a tunable scalar α to modulate reasoning strength:
    h̃ = Rescale(hori + α · hsteer),
    where Rescale preserves the norm of the latent state to ensure stability across layers.
  • Fractional Reasoning enables instance-specific steering of reasoning depth and reflection strength, improving output quality without modifying the underlying model.

Fractional Reasoning: Performance

  • Fractional Reasoning is a training-free and model-agnostic framework for improving test-time compute.
  • Fractional Reasoning can adaptively control the reasoning behavior in LLMs by identifying and reapplying reasoning-induced latent shifts with a tunable scaling factor.
  • Fractional Reasoning enables continuous adjustment of both reasoning depth and reflection strength—tailoring inference-time behavior to the demands of each input.

Figure 2. Averaged accuracy across MATH500, GSM8K, and GPQA. Blue bars represent standard test-time scaling methods, purple bars show these methods enhanced by our Fractional Reasoning.

Thinking

Table 1. The performance of our proposed Fractional Reasoning and other common test-time scaling methods on different reasoning benchmarks is presented.

Reflection

Table 2. The performance of our proposed Fractional Reasoning and other common test-time scaling methods on reflection.

Fractional Reasoning: Analysis

  • Fractional Reasoning can work on sentence level
  • Fractional Reasoning + majority vote works better than majority vote across all number of generations.

Figure 3. Sentence-level control dynamically adjusts reflection strength α at each generation step,enabling correction of errors missed by instance-level control.

Figure 4. Accuracy on GSM8K and GPQA as a function of the number of generations.

BibTeX


@article{liu2025fractional,
  author      = {Liu, Sheng and Chen, Tianlang and Lu, Pan and Ye, Haotian and Chen, Yizheng and Xing, Lei and Zou, James},
  title       = {Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute},
  publisher   = {arXiv:2301.07093},
  year        = {2025},
}

Acknowledgement

This website is adapted from Nerfies and X-Decoder, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Related Links :