Skip to content

Replica Configuration, Scheduling, and Repeatability (Part 5 of 6)

Throughput claims only matter if they are repeatable. And repeatability is not just an NFR concern: stable performance under sustained load often translates directly into better functional outcomes; enhanced consistency in transaction finality, reducing the necessity for retries and minimising edge-case latencies that can emerge during peak-volume periods.

In Kubernetes, repeatability often comes down to disciplined deployment topology.

The engineering problem

The traps are common:

  • Replicas are not a magic multiplier. Optimising Distributed Throughput: While horizontal scaling is a core strength of the platform, achieving peak efficiency requires a coordinated approach to resource allocation to ensure that increased replica counts translate directly into linear performance gains.
  • Resource Isolation Strategy: To ensure that every transaction, even during peak traffic, is processed within a predictable timeframe, we must prevent ‘resource competition’ between different tasks. By setting strict boundaries on how much CPU and memory each process can use, we guarantee a fast, consistent experience for all users, even when the system is running at high capacity.
  • Uncontrolled placement creates variance. If pods land differently from run to run, your results swing and tuning becomes guesswork.

What we changed in Mojaloop v17 

This work focused on repeatability as a first-class outcome. Practical patterns for topology-aware spreading (via `topologySpreadConstraints`) and resource isolation reduced run-to-run variance and stabilised behaviour under sustained load.

Two rules of thumb (safe and transferable)

  1. Scale stateless work first. Prioritise the scaling of stateless services while utilising specialised high-availability configurations for stateful components. This ensures that the core ledger and shared dependencies maintain data integrity while the processing layer expands to meet demand.
  2. Use topologySpreadConstraints to spread replicas across failure domains. Spreading replicas reduces correlated failure and reduces the risk that a single host (or zone) becomes the performance limiter.

Practical guidance for adopters

If you’re operating Mojaloop-based infrastructure, treat replica counts as a hypothesis:

  • Pick a baseline.
  • Run sustained tests with security enabled.
  • Change one thing at a time.
  • Look at tail latency, timeouts, and retry rates; not only averages.

The goal is not “maximum TPS.” It’s a predictable operation over time.

A handover to help the community continue

After the core contribution work, INFITX also provided the Mojaloop Foundation’s community performance workstream with a baseline starting replica configuration (as Helm values). This provides an evidence-driven baseline configuration. While it serves as a robust starting point for production-grade deployments, we encourage adopters to utilise the community performance framework to further fine-tune the platform for their specific regional volume profiles.

What’s next

Part 6 closes the series with the performance workflow itself: observability, profiling under realistic load, and how to turn bottlenecks into safe engineering changes.

Contact the INFITX Team