FoRA: Optimizing Parameter-Efficient Fine-Tuning With Fisher-Orthogonal Rank Adaptation (2605.29317)

[Submitted on 28 May 2026 (v1), last revised 29 May 2026 (this version, v2)]

<p>View a PDF of the paper titled <strong>FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning</strong>, by Juneyoung Park and eight other authors</p>
<p>View PDF</p>
<p>HTML (experimental)</p>

<blockquote class="abstract mathjax">
  <span class="descriptor">Abstract:</span> Parameter-efficient fine-tuning (PEFT) has largely focused on LoRA and its accuracy-oriented variants, while the original goal of reducing the number of trainable parameters has received comparatively little attention. We introduce <strong>FoRA</strong>, which revisits this goal by reducing the number of adapted layers rather than adapter rank. FoRA selects task-informative layers via a single-pass diagonal Fisher score (under 1% of training cost) and trains the LoRA down-projection at selected layers on the Stiefel manifold, preserving column orthonormality and effective rank. FoRA consistently outperforms LoRA and DoRA at half their parameter budget and falls within 0.7-0.8 accuracy points of AdaLoRA at one-quarter its parameter count, across five LLaMA-family backbones. Cross-architecture experiments on twelve backbones from the LLaMA, Qwen3, and Gemma families confirm consistent gains from 270M to 32B parameters. The two components combine super-additively: Fisher selection alone matches rank reduction at the same budget, while the Stiefel constraint provides the decisive additional gain.
</blockquote>

<!--CONTEXT-->

Submission History

From: Juneyoung Park [view email]
[v1]
Thu, 28 May 2026 03:47:00 UTC (261 KB)
[v2]
Fri, 29 May 2026 03:38:01 UTC (257 KB)

Understanding Parameter-Efficient Fine-Tuning (PEFT)

Parameter-efficient fine-tuning (PEFT) has revolutionized how we adapt large pre-trained models to specific tasks without incurring the computational overhead of training all parameters from scratch. Traditionally, methods like Low-Rank Adaptation (LoRA) have been at the forefront, emphasizing accuracy and performance. However, a significant challenge has always been maintaining efficiency, particularly in minimizing the number of trainable parameters.

Contents

Submission History

Understanding Parameter-Efficient Fine-Tuning (PEFT)
Introduction of FoRA
Benefits of Layer Reduction
Fisher Score Selection
Performance Metrics and Comparisons
Versatility Across Architectures
The Stiefel Manifold Advantage
Conclusion

Introduction of FoRA

Introducing FoRA (Fisher-orthogonal Rank Adaptation) marks a notable shift in the landscape of parameter efficiency. By focusing not just on adapter rank but instead on the number of adapted layers, FoRA stands out. The approach is both novel and practical, achieving this goal through sophisticated methods that include selecting task-informative layers via a diagonal Fisher score. This selection process is remarkably efficient, taking less than 1% of the training cost, highlighting FoRA’s potential for widespread applicability.

Benefits of Layer Reduction

What sets FoRA apart is its emphasis on layer selection rather than traditional adapter rank reduction. This strategic focus allows FoRA to achieve impressive performance metrics while requiring fewer resources. Instead of spreading updates thinly across many layers, choosing specific, impactful layers ensures that computational efforts lead to meaningful gains. This is particularly crucial for developers and researchers seeking to fine-tune models without incurring significant penalties in terms of performance or resources.

Fisher Score Selection

One of the core components of FoRA is the Fisher score approach to layer selection. By employing a single-pass calculation, FoRA efficiently identifies the layers that contribute the most to task performance. This method not only accelerates the fine-tuning process but also ensures that the resulting model maintains a strong alignment with task requirements. In a field where each computation counts, such efficiency makes a substantial difference.

Performance Metrics and Comparisons

In rigorous empirical evaluations, FoRA demonstrates a measurable edge over existing methods like LoRA and DoRA. It manages to outperform these competitors at just half their parameter budget, a compelling argument for its adoption in various applications. Additionally, it achieves performance within a narrow margin of established methods like AdaLoRA, enhancing its attractiveness for developers aiming to innovate yet conserve resources.

Versatility Across Architectures

FoRA has proven its versatility across various architectures, showcasing consistent gains from a wide range of models, including the LLaMA, Qwen3, and Gemma families. Testing across twelve distinct backbones, with sizes ranging from 270M to 32B parameters, indicates that FoRA’s advantages are not limited to specific configurations. As organizations increasingly turn to large language models and other advanced architectures, methods like FoRA can provide a practical solution for effective deployment.

The Stiefel Manifold Advantage

Training on the Stiefel manifold is another hallmark of FoRA, helping to preserve column orthonormality and effective rank. This mathematical framework plays a critical role in maintaining model accuracy while adapting layers. By ensuring the adapted layers’ integrity, the Stiefel constraint provides decisive improvements over existing frameworks, combining effectively with Fisher selection for superior outcomes.

Conclusion

FoRA represents a significant advancement in the field of PEFT, bridging gaps left by prior methods focused solely on accuracy or parameter savings in isolation. Its innovative approach offers exciting avenues for increased efficiency and performance in model fine-tuning, positioning it as a promising tool for researchers and practitioners alike.

As the landscape of model adaptation continues to evolve, embracing solutions like FoRA could be the key to unlocking even greater efficiencies and breakthroughs in machine learning.

For further understanding, you can view the complete paper and explore additional details and findings through the provided link above.

Inspired by: Source

FoRA: Optimizing Parameter-Efficient Fine-Tuning with Fisher-Orthogonal Rank Adaptation (2605.29317)

Submission History

Understanding Parameter-Efficient Fine-Tuning (PEFT)

Introduction of FoRA

Benefits of Layer Reduction

Fisher Score Selection

Performance Metrics and Comparisons

Versatility Across Architectures

The Stiefel Manifold Advantage

Conclusion

Stay Connected

Explore Top AI Tools Instantly

Latest News

Meta’s Brain2Qwerty: Achieving 61% Accuracy with Noninvasive Brain–Computer Interface Technology

July 2026 Security Incident Disclosure: Key Insights and Updates

Unlocking Niche Domain Insights: CANDI’s Contextual Alignment in Question Answering

Unlocking Authentication in Virtual and Augmented Reality: A Point-Voxel Cross-Attention Network Interface

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Submission History

Understanding Parameter-Efficient Fine-Tuning (PEFT)

Introduction of FoRA

Benefits of Layer Reduction

Fisher Score Selection

Performance Metrics and Comparisons

More Read

Versatility Across Architectures

The Stiefel Manifold Advantage

Conclusion

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Meta’s Brain2Qwerty: Achieving 61% Accuracy with Noninvasive Brain–Computer Interface Technology

July 2026 Security Incident Disclosure: Key Insights and Updates

Unlocking Niche Domain Insights: CANDI’s Contextual Alignment in Question Answering

Unlocking Authentication in Virtual and Augmented Reality: A Point-Voxel Cross-Attention Network Interface