Case Study

Financial Data Aggregation Platform

A sophisticated financial data aggregation platform for the precious metals industry, intelligently combining data from 10 external providers to deliver stable, accurate pricing across 10 commodities and 14 currencies whilst gracefully handling provider failures.

Relevant for
10
Providers
10+
Commodities
14
Currencies
27
API Endpoints

The Business Challenge

E-commerce businesses selling precious metals require accurate pricing data to operate, but relying on a single data provider creates a critical single point of failure. Individual providers can experience downtime, provide stale data, or return incorrect values. Rate limits, authentication failures, and inconsistent data formats compound the problem.

The platform needed to aggregate data from 6commodity providers and 4currency providers, supporting 10+precious metals including gold, silver, platinum, palladium, and specialty metals like rhodium, tellurium, and gallium.

Core Requirements

  • Reliability through redundancy: Continue operating when individual providers fail
  • Data quality over availability: Use trust factors to prefer accurate data from reliable sources
  • Flexible pricing rules: Support premiums, day-based adjustments, time-limited promotions, and emergency overrides
  • Historical tracking: Maintain detailed historical data with 15-minute granularity
  • Gap filling: Automatically fill missing data points during provider outages or market closures

Individual data providers experience downtime, rate limits, and data quality issues. By aggregating multiple providers with trust factors (1-5 scale), the system can prefer data from the most reliable available source whilst maintaining availability even when high-trust providers fail.

Show why provider aggregation matters
This approach is fundamentally different from simple averaging: rather than letting low-quality data pollute the final price, the system uses a priority queue based on provider reliability.

Technical Architecture

The system implements a layered architecture with clear separation between command layer (cron jobs), service layer (business logic), repository layer (data access), and entity layer (domain models). At its core is a factory-based provider architecture with common interface contracts enabling parallel HTTP requests and graceful error handling.

Multi-Provider Aggregation

Each data provider has a corresponding factory class that creates configured provider instances. All providers implement a common interface with clear separation between request and response handling, enabling all providers to initiate requests before any begin processing responses.

The system handles provider failures gracefully using isolated error handling. Each provider's request/response cycle is wrapped in exception handling to prevent one failing provider from affecting others.

Provider failures are isolated using try-catch blocks around handleResponse(). When a provider throws an exception, the system logs a critical error but continues processing other providers. This ensures one failing provider does not bring down the entire data collection process.

Show how provider failures are isolated
Each provider's response is stored with a trust factor (1-5, where 1 is most trusted), enabling the trust factor filter to select the best available data.

Trust Factor Priority System

Rather than averaging all provider responses, the system uses values exclusively from the highest-trust provider that returned valid data. This prevents low-quality data from polluting the final price calculation. The system falls back to lower-trust providers only when higher-trust sources fail.

The trust factor priority system differs fundamentally from standard averaging approaches used by other aggregation systems. Understanding this difference is key to appreciating the system's data quality guarantees.

Averaging all provider responses seems intuitive but introduces a critical flaw: a provider returning stale or incorrect data pulls the final price toward that bad value. For example, if four providers return £1,800/oz and one provider returns £1,600/oz (stale data from yesterday), the average becomes £1,760/oz, inaccurate despite most providers being correct.

Show why not average all provider responses?
The trust factor system solves this by treating provider reliability as a priority queue rather than a weighting system. If all high-trust providers fail, falling back to a medium-trust provider is better than blending bad data into the result.

Flexible Premium Rules Engine

E-commerce businesses need to apply various adjustments to base commodity prices: standard premiums (markup percentages or fixed amounts), day-specific adjustments (weekend pricing, reduced staff days), time-limited promotions or seasonal adjustments, and emergency price overrides (market volatility, supply issues).

The system implements a chain-of-responsibility pattern for rule processing with four distinct rule types: basic premium rules (always-active adjustments), day-based rules (specific days of the week), time-based rules (date ranges), and override rules (complete replacement). Rules stack appropriately with clear precedence.

Precision Financial Arithmetic

Floating-point arithmetic introduces rounding errors that compound over thousands of transactions. The system uses Money PHP for money objects and bcmath functions for arbitrary-precision decimal calculations, ensuring accuracy to the cent.

The rules engine also validates configuration to prevent errors. Custom validators ensure business rules remain consistent and conflict-free.

Time-based rules use custom Symfony validators to prevent conflicting rules. The TimeBasedRulesCanNotOverlap constraint ensures no two time-based rules for the same commodity/currency overlap in their date ranges.

Show rule validation and conflict prevention
Doctrine ORM inheritance mapping models the rule hierarchy using JOINED table inheritance, enabling shared base functionality whilst allowing each rule type to have specific fields (day name for day-based rules, start/end dates for time-based rules).

Historic Data Management

The system maintains detailed historical price data with 15-minute granularity. Provider failures or market closures create gaps that need filling. Different commodities may need different timestamp normalisation. Historical queries must be efficient for charting and analysis.

Dual Table Architecture

The system maintains two distinct data tables: LatestCommodityPrice tracks current values per provider (one record per provider-commodity pair, updated in place), whilst HistoricCommodityPrice stores aggregated values with timestamp uniqueness (one record per commodity-timestamp pair, never updated).

A single table approach would require date filtering for “latest” queries and create index conflicts between provider+commodity (for latest) and commodity+timestamp (for historic). Separating concerns prevents these conflicts and simplifies queries.

Show why separate latest and historic tables?
The latest table remains small (providers × commodities, typically <100 records) whilst the historic table grows indefinitely (96 records per commodity per day). Different access patterns benefit from different indexing strategies.

Timestamp Normalisation

Timestamps are normalised to 15-minute boundaries before storage. A price arriving at 14:23 becomes 14:15. This enables database unique constraints on commodity+timestamp, direct chart queries without transformation, and straightforward gap detection.

Gap Filling Service

When providers fail or markets close, the system fills gaps using the last known good value. For each missing timestamp, the service looks up the most recent historic price and replicates it forward to maintain continuous data coverage.

The gap filling service generates 96 records per day (24 hours × 4 quarters). For each 15-minute interval, it checks if a historic record exists. If not, it uses the last known price from earlier in the day (or the previous day if necessary).

Show how gap filling works in detail
This approach ensures consuming applications always have data for every 15-minute interval, even if some providers were unavailable during collection. Charts and analysis tools do not need special handling for missing data points.

Graceful Degradation & Reliability

The system implements several patterns to handle real-world operational challenges: weekend and market closure handling, timezone conversion, stale data detection, and extensive logging for observability.

Weekend and Market Closure Handling

Commodity markets close on weekends (typically from Friday evening until Sunday evening). The system must handle this gracefully without flooding logs with errors. Day-aware error handling logs informational messages during expected downtime periods rather than critical errors.

Another operational challenge involves handling timezone differences across providers. Not all providers report timestamps in UTC, requiring explicit timezone conversion before storage.

One major provider reports timestamps in British Summer Time (BST) during summer months. All internal storage uses UTC. The system explicitly parses timestamps in the provider's local timezone (Europe/London), then converts to UTC before storage.

Show timezone conversion challenges
This ensures consistent time-series data regardless of provider timezone conventions or daylight saving time transitions. Without this conversion, data would be recorded at incorrect times during BST periods, breaking time-series queries and gap filling.

Configuration Management

Provider credentials and endpoints are managed through environment variables, with an abstract configuration base class that validates required settings. This ensures missing configuration is caught immediately at application startup rather than failing during scheduled data collection.

The system implements extensive logging at every layer: provider request/response cycles, trust factor selection, rule application (with before/after prices), gap filling operations, and duplicate detection. This creates a complete audit trail for financial calculations.

Show observability and logging strategy
Structured log messages enable operations teams to diagnose issues quickly. For example, seeing “Using feed values with trust factor: 2” immediately indicates the primary provider failed and the system fell back to a secondary source.

Measurable Outcomes

The platform delivers stable, accurate pricing data whilst gracefully handling provider failures. The architecture choices result in a maintainable, testable codebase with clear separation of concerns.

Code Quality Metrics

  • 296source files - Well-organised codebase with clear layering
  • 299test files - Near 1:1 test-to-code ratio providing confidence in financial calculations
  • 11,175lines of code - Application code (excluding tests and vendor dependencies)
  • 20database migrations - Schema evolution spanning 5 years of development
  • 27API endpoints - RESTful API for consuming applications

The system aggregates data from 6 commodity providers and 4 currency providers, supporting 10 precious metals (gold, silver, platinum, palladium, rhodium, tellurium, ruthenium, rhenium, indium, gallium) across 14 international currencies (USD, EUR, GBP, AUD, JPY, CAD, SGD, MYR, AED, CHF, HKD, CLP, BRL, MXN).

Show business capability coverage
With 15-minute granularity and 4 rule types (basic, day-based, time-based, override), the platform provides flexible pricing control whilst maintaining data continuity through automatic gap filling.

Architectural Benefits

  • Resilience: System continues operating when individual providers fail, with trust factor priority ensuring data quality
  • Extensibility: New providers can be added by implementing factory and feed interfaces, no changes to core logic required
  • Correctness: Precision arithmetic with bcmath functions and thorough testing ensure financial-grade accuracy
  • Observability: Detailed logging tracks which providers succeed/fail and which trust level was used for each calculation
  • Maintainability: Clear separation of concerns between layers with well-defined interfaces and near 1:1 test coverage

The trust factor priority system is fundamentally different from standard averaging approaches. Rather than treating all providers equally, it recognises that data quality varies and explicitly prioritises reliable sources over mere availability.

The dual storage model (latest + historic) prevents index conflicts and optimises for different access patterns. Timestamp normalisation at storage time simplifies querying and enables database-level duplicate prevention.

Show what makes this architecture notable
The chain-of-responsibility pattern for rules creates predictable behaviour with clear audit trails, whilst custom validators prevent conflicting rule configurations. These design decisions compound to create a system that is both reliable in operation and straightforward to reason about.

Ready to eliminate your technical debt?

Transform unmaintainable legacy code into a clean, modern codebase that your team can confidently build upon.