Streamlining Financial Data: Agoda’s Centralized Approach Using Apache Spark
In the fast-paced world of online travel bookings, maintaining clean and consistent financial data is crucial. Agoda recently transformed its data management processes by consolidating multiple independent financial data pipelines into a centralized, efficient platform powered by Apache Spark. This strategic move was aimed at eliminating inconsistencies in financial data, thereby enhancing the reliability of financial metrics used in strategic planning and reporting.
The Challenge: Disjointed Data Pipelines
The troubles arose from a common scenario in large enterprises. Agoda’s Data Engineering, Business Intelligence, and Data Analysis teams each established their own independent financial data pipelines. While this separation allowed for clear ownership, it inevitably resulted in duplicate processing and conflicting metrics within the organization. Warot Jongboondee, a member of Agoda’s engineering team, highlighted that these discrepancies “could potentially impact Agoda’s financial statements.”
Separate financial data pipelines (source)
A Robust Solution: The Financial Unified Data Pipeline (FINUDP)
In response to these challenges, Agoda introduced the Financial Unified Data Pipeline (FINUDP). This groundbreaking initiative aims to establish a singular, trusted source of truth for all financial data, encompassing metrics such as sales, costs, revenues, and margins. Built on the flexible and powerful Apache Spark framework, the new system enables hourly updates for downstream teams, crucial for accurate reconciliation and financial forecasting.
Creating this centralized approach required a significant commitment of time and resources. Aligning stakeholders across various departments—including product, finance, and engineering—on shared definitions and metrics was an exhaustive process. Initially, query runtimes stretched to five hours, requiring extensive optimization in query tuning and infrastructure adjustments, ultimately reducing processing times to around 30 minutes.
Unified Financial Data Pipeline (FINUDP) architecture (source)
Implementing a Multi-Layered Quality Framework
Agoda’s commitment to data integrity doesn’t end with a centralized platform; it extends into a sophisticated quality framework. Their multi-layered approach leverages automated validations to ensure data tables are free from null values and adhere to specified range constraints and integrity rules. Crucially, if any business-critical validation fails, the pipeline halts automatically, preventing the processing of potentially erroneous data.
The quality assurance system also employs Quilliup to contrast target and source tables effectively. Additionally, data contracts with upstream teams delineate expected rules, with immediate alerts triggered upon violations. Anomaly detection is further enhanced through machine learning models that continuously monitor data patterns, ensuring that any irregularities can be swiftly addressed. A three-tier alert system provides response mechanisms via email, Slack notifications, or escalations to Agoda’s 24/7 Network Operations Center for any latency issues.
Industry Trends and Quality Challenges
Agoda’s innovations align with broader trends in the data management landscape. Research indicates that 64% of organizations view poor data quality as their most pressing challenge. This is where data contracts become increasingly relevant; Gartner notes that they are emerging as a strategic method to manage, deliver, and govern data products, formalizing expectations between producers and consumers of data.
Balancing Trade-Offs: Coordination and Management
While the benefits of centralization are clear, the transition was not without sacrifices. Development velocity faced a downturn, as any changes instituted now require comprehensive testing across the entire data pipeline. This created dependencies that necessitated the entire pipeline to await all upstream datasets before progressing. The thorough documentation and consensus-building efforts, while time-consuming, fostered greater trust among stakeholders. Jongboondee emphasized that centralization demands tighter coordination, describing it as “careful change management at every step.”
Achieving Reliability and Uptime
Despite the complexities involved, the FINUDP initiative has made commendable strides in operational reliability, currently achieving 95.6% uptime with ambitions to reach 99.5% availability. All alterations undergo an extensive shadow testing process, comparing queries on both the proposed and existing versions before implementation. Moreover, a dedicated staging environment emulates the production setup, allowing teams to conduct thorough testing prior to any significant releases.
Toward Comprehensive Reliability Systems
The FINUDP initiative clearly illustrates how organizations dealing with extensive business data are transitioning away from random, ad-hoc quality checks toward robust, architecturally enforced reliability systems. By prioritizing consistency and auditability, Agoda is not only enhancing its financial reporting and compliance measures but is also establishing a model of resilience that is essential for today’s data-driven landscape.
Inspired by: Source
- The Challenge: Disjointed Data Pipelines
- A Robust Solution: The Financial Unified Data Pipeline (FINUDP)
- Implementing a Multi-Layered Quality Framework
- Industry Trends and Quality Challenges
- Balancing Trade-Offs: Coordination and Management
- Achieving Reliability and Uptime
- Toward Comprehensive Reliability Systems



