Saltar a contenido

ADR-006: Single-threaded processing por partition

  • Status: Accepted
  • Date: 2026-05-14
  • Tags: core, architecture, concurrency

Context and Problem Statement

El engine procesa commands del log. ¿Concurrent processing con locks para throughput, o single-threaded actor model para simplicity?

Decision Drivers

  • Determinism (no race conditions = replay funciona)
  • Simplicity (no locks, no concurrency bugs)
  • Predictable latency
  • Intuit benchmark mostró "1 partition por broker es óptimo" (validación empírica)

Considered Options

  1. Single-threaded per partition (actor model, como Camunda)
  2. Multi-threaded con locks (CRUD tradicional)
  3. Optimistic concurrency control (CAS, retry on conflict)
  4. Lock-free data structures (CAS extensivo)

Decision Outcome

Chosen option: Single-threaded per partition porque: - Eliminates entire class of bugs (race conditions, deadlocks, optimistic locking failures) - Habilita replay determinism (ADR-019) - Predictable latency (no contention) - Validated por Camunda en production a millones de instances - Para single-node MVP: 1 thread principal es suficiente para target TPS

Positive Consequences

  • Cero race conditions en processing
  • Cero deadlocks
  • No locks (significant performance + simplicity win)
  • Replay determinism trivially achievable
  • Latency predictable
  • Debugging straightforward (sequential)
  • Más simple de razonar sobre correctness

Negative Consequences

  • Throughput pico de UNA partition limitado a un core
  • Para escalar, requires partitioning (multi-partition o multi-tenant sharding)
  • Single-threaded saturation requires scaling horizontal (no vertical)

Pros and Cons of the Options

Single-threaded per partition (actor model)

Pros: - Sin race conditions - Sin locks - Replay determinism - Latency consistente - Debugging fácil - Validated en production

Cons: - Single-core throughput ceiling - Scaling requires partitioning

Multi-threaded con locks

Pros: - Utiliza múltiples cores - Throughput pico mayor

Cons: - Race conditions inevitables - Deadlocks posibles - Locks reducen throughput práctico (contention) - Latency variance alta (waiting on locks) - Replay determinism IMPOSIBLE (orden de lock acquisition no-determinístico) - Debugging es nightmare

Optimistic concurrency (CAS)

Pros: - Throughput alto con bajo contention - No locks bloqueantes

Cons: - Retry storms bajo alto contention - Replay determinism imposible (qué retry succeeded primero?) - Aborted transactions waste work - Complexity en code (todo write es try-catch retry)

Lock-free data structures

Pros: - Performance teórico máximo

Cons: - Implementations correctas son extremadamente difíciles - Bugs sutiles (memory model assumptions) - Replay determinism casi imposible - Maintenance burden alto

Implementación: actor pattern

# Pseudo-code
class PartitionProcessor:
    def __init__(self, partition_id, db_pool):
        self.partition_id = partition_id
        self.db_pool = db_pool
        self.command_queue = asyncio.Queue()  # ordered
        self.running = False

    async def start(self):
        self.running = True
        await self.process_loop()

    async def process_loop(self):
        # SINGLE coroutine - no concurrent processing
        while self.running:
            command = await self.command_queue.get()
            await self.process_command(command)  # serial

    async def process_command(self, command):
        async with self.db_pool.transaction() as tx:
            # Read state
            state = await tx.fetch_state(command.scope)

            # Process
            events = self.processor.process(command, state)

            # Write events + state (atomic)
            await tx.append_events(events)
            await tx.update_state(events)

            # Mark command processed
            await tx.mark_processed(command.position)

Key: una sola coroutine procesa commands. Nuevos commands esperan en queue. Sin concurrency = sin race conditions.

Throughput considerations

Si single-thread se vuelve bottleneck:

MVP Phase 0-2: 1 partition

Suficiente para 200-1000 TPS (probado por Intuit en Camunda 8: 375 TPS por partition con Java; Postgres mantiene similar range).

Phase 3+: múltiples partitions

Cuando un solo thread no alcance: - Por tenant: partition lookup por tenant_id (ADR-021) - Por process_instance_key hash - Each partition has its own dedicated thread

Cada partition mantiene single-thread internally. Múltiples partitions corren en paralelo.

Validación empírica (Intuit benchmark)

Intuit reportó (ver analysis/intuit-production-benchmarks):

"One partition per broker is optimal to get the best results"

Más partitions por broker NO mejora throughput porque compiten por cores. La regla: 1 partition = 1 thread = 1 core.