<p>View a PDF of the paper titled <strong>FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning</strong>, by Juneyoung Park and eight other authors</p>
<p>View PDF</p>
<p>HTML (experimental)</p>
<blockquote class="abstract mathjax">
<span class="descriptor">Abstract:</span> Parameter-efficient fine-tuning (PEFT) has largely focused on LoRA and its accuracy-oriented variants, while the original goal of reducing the number of trainable parameters has received comparatively little attention. We introduce <strong>FoRA</strong>, which revisits this goal by reducing the number of adapted layers rather than adapter rank. FoRA selects task-informative layers via a single-pass diagonal Fisher score (under 1% of training cost) and trains the LoRA down-projection at selected layers on the Stiefel manifold, preserving column orthonormality and effective rank. FoRA consistently outperforms LoRA and DoRA at half their parameter budget and falls within 0.7-0.8 accuracy points of AdaLoRA at one-quarter its parameter count, across five LLaMA-family backbones. Cross-architecture experiments on twelve backbones from the LLaMA, Qwen3, and Gemma families confirm consistent gains from 270M to 32B parameters. The two components combine super-additively: Fisher selection alone matches rank reduction at the same budget, while the Stiefel constraint provides the decisive additional gain.
</blockquote>
<!--CONTEXT-->
Submission History
From: Juneyoung Park [view email]
[v1]
Thu, 28 May 2026 03:47:00 UTC (261 KB)
[v2]
Fri, 29 May 2026 03:38:01 UTC (257 KB)
Understanding Parameter-Efficient Fine-Tuning (PEFT)
Parameter-efficient fine-tuning (PEFT) has revolutionized how we adapt large pre-trained models to specific tasks without incurring the computational overhead of training all parameters from scratch. Traditionally, methods like Low-Rank Adaptation (LoRA) have been at the forefront, emphasizing accuracy and performance. However, a significant challenge has always been maintaining efficiency, particularly in minimizing the number of trainable parameters.
Introduction of FoRA
Introducing FoRA (Fisher-orthogonal Rank Adaptation) marks a notable shift in the landscape of parameter efficiency. By focusing not just on adapter rank but instead on the number of adapted layers, FoRA stands out. The approach is both novel and practical, achieving this goal through sophisticated methods that include selecting task-informative layers via a diagonal Fisher score. This selection process is remarkably efficient, taking less than 1% of the training cost, highlighting FoRA’s potential for widespread applicability.
Benefits of Layer Reduction
What sets FoRA apart is its emphasis on layer selection rather than traditional adapter rank reduction. This strategic focus allows FoRA to achieve impressive performance metrics while requiring fewer resources. Instead of spreading updates thinly across many layers, choosing specific, impactful layers ensures that computational efforts lead to meaningful gains. This is particularly crucial for developers and researchers seeking to fine-tune models without incurring significant penalties in terms of performance or resources.
Fisher Score Selection
One of the core components of FoRA is the Fisher score approach to layer selection. By employing a single-pass calculation, FoRA efficiently identifies the layers that contribute the most to task performance. This method not only accelerates the fine-tuning process but also ensures that the resulting model maintains a strong alignment with task requirements. In a field where each computation counts, such efficiency makes a substantial difference.
Performance Metrics and Comparisons
In rigorous empirical evaluations, FoRA demonstrates a measurable edge over existing methods like LoRA and DoRA. It manages to outperform these competitors at just half their parameter budget, a compelling argument for its adoption in various applications. Additionally, it achieves performance within a narrow margin of established methods like AdaLoRA, enhancing its attractiveness for developers aiming to innovate yet conserve resources.
Versatility Across Architectures
FoRA has proven its versatility across various architectures, showcasing consistent gains from a wide range of models, including the LLaMA, Qwen3, and Gemma families. Testing across twelve distinct backbones, with sizes ranging from 270M to 32B parameters, indicates that FoRA’s advantages are not limited to specific configurations. As organizations increasingly turn to large language models and other advanced architectures, methods like FoRA can provide a practical solution for effective deployment.
The Stiefel Manifold Advantage
Training on the Stiefel manifold is another hallmark of FoRA, helping to preserve column orthonormality and effective rank. This mathematical framework plays a critical role in maintaining model accuracy while adapting layers. By ensuring the adapted layers’ integrity, the Stiefel constraint provides decisive improvements over existing frameworks, combining effectively with Fisher selection for superior outcomes.
Conclusion
FoRA represents a significant advancement in the field of PEFT, bridging gaps left by prior methods focused solely on accuracy or parameter savings in isolation. Its innovative approach offers exciting avenues for increased efficiency and performance in model fine-tuning, positioning it as a promising tool for researchers and practitioners alike.
As the landscape of model adaptation continues to evolve, embracing solutions like FoRA could be the key to unlocking even greater efficiencies and breakthroughs in machine learning.
For further understanding, you can view the complete paper and explore additional details and findings through the provided link above.
Inspired by: Source

