ADR-006: Single-threaded processing por partition¶
- Status: Accepted
- Date: 2026-05-14
- Tags: core, architecture, concurrency
Context and Problem Statement¶
El engine procesa commands del log. ¿Concurrent processing con locks para throughput, o single-threaded actor model para simplicity?
Decision Drivers¶
- Determinism (no race conditions = replay funciona)
- Simplicity (no locks, no concurrency bugs)
- Predictable latency
- Intuit benchmark mostró "1 partition por broker es óptimo" (validación empírica)
Considered Options¶
- Single-threaded per partition (actor model, como Camunda)
- Multi-threaded con locks (CRUD tradicional)
- Optimistic concurrency control (CAS, retry on conflict)
- Lock-free data structures (CAS extensivo)
Decision Outcome¶
Chosen option: Single-threaded per partition porque: - Eliminates entire class of bugs (race conditions, deadlocks, optimistic locking failures) - Habilita replay determinism (ADR-019) - Predictable latency (no contention) - Validated por Camunda en production a millones de instances - Para single-node MVP: 1 thread principal es suficiente para target TPS
Positive Consequences¶
- Cero race conditions en processing
- Cero deadlocks
- No locks (significant performance + simplicity win)
- Replay determinism trivially achievable
- Latency predictable
- Debugging straightforward (sequential)
- Más simple de razonar sobre correctness
Negative Consequences¶
- Throughput pico de UNA partition limitado a un core
- Para escalar, requires partitioning (multi-partition o multi-tenant sharding)
- Single-threaded saturation requires scaling horizontal (no vertical)
Pros and Cons of the Options¶
Single-threaded per partition (actor model)¶
Pros: - Sin race conditions - Sin locks - Replay determinism - Latency consistente - Debugging fácil - Validated en production
Cons: - Single-core throughput ceiling - Scaling requires partitioning
Multi-threaded con locks¶
Pros: - Utiliza múltiples cores - Throughput pico mayor
Cons: - Race conditions inevitables - Deadlocks posibles - Locks reducen throughput práctico (contention) - Latency variance alta (waiting on locks) - Replay determinism IMPOSIBLE (orden de lock acquisition no-determinístico) - Debugging es nightmare
Optimistic concurrency (CAS)¶
Pros: - Throughput alto con bajo contention - No locks bloqueantes
Cons: - Retry storms bajo alto contention - Replay determinism imposible (qué retry succeeded primero?) - Aborted transactions waste work - Complexity en code (todo write es try-catch retry)
Lock-free data structures¶
Pros: - Performance teórico máximo
Cons: - Implementations correctas son extremadamente difíciles - Bugs sutiles (memory model assumptions) - Replay determinism casi imposible - Maintenance burden alto
Implementación: actor pattern¶
# Pseudo-code
class PartitionProcessor:
def __init__(self, partition_id, db_pool):
self.partition_id = partition_id
self.db_pool = db_pool
self.command_queue = asyncio.Queue() # ordered
self.running = False
async def start(self):
self.running = True
await self.process_loop()
async def process_loop(self):
# SINGLE coroutine - no concurrent processing
while self.running:
command = await self.command_queue.get()
await self.process_command(command) # serial
async def process_command(self, command):
async with self.db_pool.transaction() as tx:
# Read state
state = await tx.fetch_state(command.scope)
# Process
events = self.processor.process(command, state)
# Write events + state (atomic)
await tx.append_events(events)
await tx.update_state(events)
# Mark command processed
await tx.mark_processed(command.position)
Key: una sola coroutine procesa commands. Nuevos commands esperan en queue. Sin concurrency = sin race conditions.
Throughput considerations¶
Si single-thread se vuelve bottleneck:
MVP Phase 0-2: 1 partition¶
Suficiente para 200-1000 TPS (probado por Intuit en Camunda 8: 375 TPS por partition con Java; Postgres mantiene similar range).
Phase 3+: múltiples partitions¶
Cuando un solo thread no alcance: - Por tenant: partition lookup por tenant_id (ADR-021) - Por process_instance_key hash - Each partition has its own dedicated thread
Cada partition mantiene single-thread internally. Múltiples partitions corren en paralelo.
Validación empírica (Intuit benchmark)¶
Intuit reportó (ver analysis/intuit-production-benchmarks):
"One partition per broker is optimal to get the best results"
Más partitions por broker NO mejora throughput porque compiten por cores. La regla: 1 partition = 1 thread = 1 core.
Links¶
- concepts/stream-processing — Pattern detallado
- analysis/intuit-production-benchmarks — Validación empírica
- adrs/adr-005-stream-processing-command-sourcing — Habilita esto
- adrs/adr-019-replay-determinism-invariant — Beneficio derivado
- concepts/microbenchmark-methodology — Optimizations sin sacrificar single-thread