Enhancing Flow Policy With Fisher Decorator: Using A Local Transport Map For Improved Performance

[Submitted on 20 Apr 2026 (v1), last revised 5 May 2026 (this version, v2)]

View a PDF of the paper titled Fisher Decorator: Refining Flow Policy via a Local Transport Map, by Xiaoyuan Cheng and six co-authors.

View PDF
HTML (experimental)

Abstract: Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the $L_2$ regularization as an upper bound of the 2-Wasserstein distance ($W_2$), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the $L_2$ (or upper bound of $W_2$) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By analyzing the induced density transformation, we derive a local quadratic approximation of the KL-constrained objective governed by the Fisher information matrix, enabling a tractable anisotropic optimization formulation. By leveraging the score function embedded in the flow velocity, we obtain a corresponding quadratic constraint for efficient optimization. Our results reveal that the optimality gap in prior methods arises from their isotropic approximation. In contrast, our framework achieves a controllable approximation error within a provable neighborhood of the optimal solution. Extensive experiments demonstrate state-of-the-art performance across diverse offline RL benchmarks. See project page: this https URL.

Submission History

From: Xiaoyuan Cheng [view email]

[v1] Mon, 20 Apr 2026 07:54:36 UTC (4,017 KB)
[v2] Tue, 5 May 2026 15:00:45 UTC (3,523 KB)

### Overview of Reinforcement Learning and Flow Policies

Reinforcement learning (RL) has made significant strides in various domains, enabling machines to learn complex tasks through interaction with their environments. One of the latest innovations in this area is the implementation of flow-based offline RL, which utilizes flow policies to enhance decision-making. This approach builds upon the principles of flow matching, providing a robust framework for policy improvement while acknowledging the inherent limitations of previous methodologies.

### Understanding Flow Policies in Offline RL

In traditional offline reinforcement learning, the challenge often arises from limited data availability and the need to generalize from past experiences. Flow policies, which represent a way to parameterize decision-making processes, tackle these issues by establishing a relationship between actions and their probabilistic consequences. The adoption of flow policies allows for greater expressiveness, but this comes with significant trade-offs regarding optimality and efficiency.

### The Geometric Perspective

The heart of the recent research by Xiaoyuan Cheng and co-authors lies in a geometric understanding of policy refinement. The discrepancy between the behavioral policy manifold – which is inherently anisotropic – and the isotropic nature of $L_2$ regularization presents complex optimization challenges. This misalignment can lead to inefficiencies and suboptimal policy outcomes, which the authors aim to rectify.

### Introducing the Fisher Decorator Framework

Cheng and his team propose an innovative approach that reinterprets policy refinement as a local transport map. This new framework introduces an initial flow policy, which is further refined by a residual displacement to optimize performance. By analyzing the resultant density transformation, they formulate a local quadratic approximation of the objective function, governed by the Fisher information matrix. This technique not only addresses the inherent anisotropy but also provides a means to conduct efficient optimization.

### Exploring the Benefits of the Local Transport Map

What sets this framework apart is its ability to account for the anisotropic nature of the behavioral manifold. By leveraging score functions embedded in the flow velocity, the proposed approach introduces a quadratic constraint that enhances optimization performance. This method enables practitioners to achieve a controllable approximation error, thereby ensuring that the solutions generated remain close to the optimal.

### Empirical Validation

The implications of this research go beyond theoretical advancements, as extensive experiments conducted across various offline RL benchmarks demonstrate state-of-the-art performance. By prioritizing a geometric approach to policy refinement, the Fisher Decorator framework illustrates a significant leap forward in the context of offline reinforcement learning.

### Conclusion

The groundbreaking insights presented in this paper have the potential to reshape the landscape of offline reinforcement learning, particularly through the lens of flow-based policies. By addressing the geometric mismatches that have impeded previous methodologies, this research opens the door to more effective and efficient learning processes. As industries increasingly rely on RL algorithms for problem-solving, the innovations explored in this work will undoubtedly play a pivotal role in advancing the field.

Inspired by: Source

Enhancing Flow Policy with Fisher Decorator: Using a Local Transport Map for Improved Performance

Submission History

Stay Connected

Explore Top AI Tools Instantly

Latest News

7 Essential OpenCode Plugins to Supercharge Your AI Coding Experience

US Government Expands AI Supplier Network and Reevaluates Anthropic’s Contribution

Google’s Latest TPU Generation: Optimized for Agent Development and State-of-the-Art Model Training

Unlocking the Power of Google Home’s Gemini AI: Tackling Complex Requests with Ease

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Submission History

More Read

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

7 Essential OpenCode Plugins to Supercharge Your AI Coding Experience

US Government Expands AI Supplier Network and Reevaluates Anthropic’s Contribution

Google’s Latest TPU Generation: Optimized for Agent Development and State-of-the-Art Model Training

Unlocking the Power of Google Home’s Gemini AI: Tackling Complex Requests with Ease