View a PDF of the paper titled Fisher Decorator: Refining Flow Policy via a Local Transport Map, by Xiaoyuan Cheng and six co-authors.
View PDF
HTML (experimental)
Abstract: Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the $L_2$ regularization as an upper bound of the 2-Wasserstein distance ($W_2$), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the $L_2$ (or upper bound of $W_2$) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By analyzing the induced density transformation, we derive a local quadratic approximation of the KL-constrained objective governed by the Fisher information matrix, enabling a tractable anisotropic optimization formulation. By leveraging the score function embedded in the flow velocity, we obtain a corresponding quadratic constraint for efficient optimization. Our results reveal that the optimality gap in prior methods arises from their isotropic approximation. In contrast, our framework achieves a controllable approximation error within a provable neighborhood of the optimal solution. Extensive experiments demonstrate state-of-the-art performance across diverse offline RL benchmarks. See project page: this https URL.
Submission History
From: Xiaoyuan Cheng [view email]
[v1] Mon, 20 Apr 2026 07:54:36 UTC (4,017 KB)
[v2] Tue, 5 May 2026 15:00:45 UTC (3,523 KB)
### Overview of Reinforcement Learning and Flow Policies
Reinforcement learning (RL) has made significant strides in various domains, enabling machines to learn complex tasks through interaction with their environments. One of the latest innovations in this area is the implementation of flow-based offline RL, which utilizes flow policies to enhance decision-making. This approach builds upon the principles of flow matching, providing a robust framework for policy improvement while acknowledging the inherent limitations of previous methodologies.
### Understanding Flow Policies in Offline RL
In traditional offline reinforcement learning, the challenge often arises from limited data availability and the need to generalize from past experiences. Flow policies, which represent a way to parameterize decision-making processes, tackle these issues by establishing a relationship between actions and their probabilistic consequences. The adoption of flow policies allows for greater expressiveness, but this comes with significant trade-offs regarding optimality and efficiency.
### The Geometric Perspective
The heart of the recent research by Xiaoyuan Cheng and co-authors lies in a geometric understanding of policy refinement. The discrepancy between the behavioral policy manifold – which is inherently anisotropic – and the isotropic nature of $L_2$ regularization presents complex optimization challenges. This misalignment can lead to inefficiencies and suboptimal policy outcomes, which the authors aim to rectify.
### Introducing the Fisher Decorator Framework
Cheng and his team propose an innovative approach that reinterprets policy refinement as a local transport map. This new framework introduces an initial flow policy, which is further refined by a residual displacement to optimize performance. By analyzing the resultant density transformation, they formulate a local quadratic approximation of the objective function, governed by the Fisher information matrix. This technique not only addresses the inherent anisotropy but also provides a means to conduct efficient optimization.
### Exploring the Benefits of the Local Transport Map
What sets this framework apart is its ability to account for the anisotropic nature of the behavioral manifold. By leveraging score functions embedded in the flow velocity, the proposed approach introduces a quadratic constraint that enhances optimization performance. This method enables practitioners to achieve a controllable approximation error, thereby ensuring that the solutions generated remain close to the optimal.
### Empirical Validation
The implications of this research go beyond theoretical advancements, as extensive experiments conducted across various offline RL benchmarks demonstrate state-of-the-art performance. By prioritizing a geometric approach to policy refinement, the Fisher Decorator framework illustrates a significant leap forward in the context of offline reinforcement learning.
### Conclusion
The groundbreaking insights presented in this paper have the potential to reshape the landscape of offline reinforcement learning, particularly through the lens of flow-based policies. By addressing the geometric mismatches that have impeded previous methodologies, this research opens the door to more effective and efficient learning processes. As industries increasingly rely on RL algorithms for problem-solving, the innovations explored in this work will undoubtedly play a pivotal role in advancing the field.
Inspired by: Source

