Skip to content

Kafka Design for Real Throughput: Partitioning, Ordering, and Concurrency (Part 4 of 6)

Kafka provides a high-throughput eventing backbone for Mojaloop, achieving maximum performance at scale when partitioning, ordering assumptions, and handler concurrency are expertly aligned with the expected workload.

In Mojaloop v17, a key performance theme was revisiting message flow design so the switch can process work in parallel without breaking correctness.

The Engineering Problem

Mojaloop follows standard scaling dynamics when it comes to throughput optimisation:

  • Hot partitions + ordering constraints silently serialise work (one partition becomes the limiter). Some operations must be processed sequentially per key, so choosing the right key is a correctness and performance decision.
  • Producer/consumer instability under load can trigger consumer disconnects and frequent consumer-group rebalances, which pauses processing and creates disruptive latency spikes.
  • Optimising handler concurrency is critical; careful tuning ensures efficient resource utilisation, preventing backlogs and maintaining predictable end-to-end latency distributions even under peak load.

Achieving stable performance requires rigorous optimisation; the changes in v17 ensure the system maintains high throughput under sustained national-scale traffic, moving beyond acceptable performance at only modest loads.

What we changed in Mojaloop v17 (high-level)

1) Partitioning that aligns with the domain

Partitioning is not only a Kafka setting; it’s a correctness decision.

One concrete example in the v17 performance work was the position batch topic: it was updated so its partitioning aligns with participant accounts. That alignment matters because it:

  • preserves the ordering that must be preserved,
  • and enables parallelism everywhere ordering is not required.
2) Handler concurrency tuned to partitioning

Once partitioning matches the domain constraints, handler concurrency can be tuned so consumers keep up efficiently without resource contention. The goal is stable throughput with predictable latency; not short-lived peaks.

3) Producer/consumer configuration tuned for stability under load

Maximum performance is not just about the technology you choose (Kafka), maximum performance requires fine-tuning Kafka to sustain high throughput under pressure.

A key optimisation addressed a concrete scenario where a specific consumer configuration could trigger disconnects and frequent consumer group rebalances. During a rebalance, message processing effectively pauses as the group stabilises, in-flight work completes, partitions are reassigned, and consumers resume. This disruptive stop-start cycle severely compromises throughput and latency.

The fix was to tune the producer/consumer configuration, ensuring consumers remain stable under load and avoid unnecessary rebalances while maintaining strict correctness requirements.

4) Profiling and queue behaviour as the guide

Data-driven optimisation: profiling that exposes queue/lag behaviour, rather than intuition. In message-driven systems, the most reliable early signals are consumer lag, processing time distributions, and retry/error patterns.

Practical guidance for adopters

If you’re operating Mojaloop-based infrastructure (or any message-driven financial system), the transferable lessons are:

  • Decide what must be ordered. Make it explicit: “ordered per participant”, “ordered per account”. 
  • Align partitions to that decision. Wrong keys create hot partitions or correctness issues.
  • Tune concurrency to match partitions. More concurrency is not always better; watch lag and end-to-end latency.
  • Monitor the right signals. Consumer lag, handler processing time, and retries will tell you where the bottleneck moved.

What’s next

Part 5 moves from messaging to repeatability: replica configuration, scheduling discipline, and why stable performance depends on deployment topology.

Contact the INFITX Team