Blueprint Plataforma Simplificada
Especificación técnica para construir un workflow engine simplificado que actúe como drop-in replacement de Camunda 8 para casos de uso comunes. Diseñado para ser implementado con Claude Code.
Principios de Diseño¶
- Single-node first: sin clustering ni Raft hasta que se necesite. PostgreSQL como state store.
- Command sourcing: mantener el patrón probado de Camunda — commands al log, procesamiento determinístico.
- REST-first API: compatible con la API de Camunda
/v2/para facilitar migración. - BPMN subset: soportar los elementos que cubren el 95% de los casos de uso.
- Composition over inheritance: processors con behaviors inyectados.
- SQL-queryable state: sin Elasticsearch — PostgreSQL para todo (state + search).
Arquitectura General¶
flowchart LR
subgraph Engine["Workflow Engine"]
API[REST API OpenAPI] --> Processor[Command Processor single-threaded]
Processor --> State[(State Store Postgres/SQLite)]
Processor --> JobPoller[Job Poller REST]
Processor --> EventLog[(Event Log append-only)]
TimerSched[Timer Scheduler] --> Processor
BpmnParser[BPMN Parser & Transformer] --> Processor
end
Módulo 1: Command Log (Event Store)¶
Lección de Camunda¶
Camunda usa un LogStream append-only replicado via Raft, con ApplicationEntry batches y FlowControl backpressure.
Implementación Simplificada¶
Table: command_log
CREATE TABLE command_log (
position BIGSERIAL PRIMARY KEY,
record_type TEXT NOT NULL, -- 'COMMAND', 'EVENT', 'REJECTION'
value_type TEXT NOT NULL, -- 'PROCESS_INSTANCE', 'JOB', 'MESSAGE', etc.
intent TEXT NOT NULL, -- 'ACTIVATE_ELEMENT', 'COMPLETE', etc.
key BIGINT NOT NULL,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
value JSONB NOT NULL,
source_position BIGINT,
partition_id INT NOT NULL DEFAULT 1
);
CREATE INDEX idx_log_unprocessed ON command_log(position) WHERE record_type = 'COMMAND';
Procesamiento: un goroutine/thread lee commands no procesados secuencialmente, genera events, los inserta en la misma tabla dentro de una transacción.
Por qué PostgreSQL: WAL nativo provee durabilidad equivalente al Raft log para single-node. LISTEN/NOTIFY puede reemplazar LogRecordAwaiter. Backup via pg_dump o streaming replication para HA.
Diferencias clave con Camunda¶
- Sin SBE — JSON (JSONB en PostgreSQL)
- Sin partitioning — single partition en MVP
- Sin Raft — PostgreSQL streaming replication para HA
- Backpressure: rate limiting simple en el API layer
Módulo 2: State Store¶
Lección de Camunda¶
142+ column families en RocksDB con type-safe access via ColumnFamily<K, V>.
Implementación Simplificada¶
Tables principales (PostgreSQL):
-- Process definitions
CREATE TABLE process_definitions (
key BIGINT PRIMARY KEY,
bpmn_process_id TEXT NOT NULL,
version INT NOT NULL,
name TEXT,
bpmn_xml TEXT NOT NULL,
checksum TEXT NOT NULL,
tenant_id TEXT NOT NULL DEFAULT '<default>',
deployed_at TIMESTAMPTZ NOT NULL,
UNIQUE(bpmn_process_id, version, tenant_id)
);
-- Process instances
CREATE TABLE process_instances (
key BIGINT PRIMARY KEY,
process_definition_key BIGINT NOT NULL REFERENCES process_definitions(key),
bpmn_process_id TEXT NOT NULL,
version INT NOT NULL,
state TEXT NOT NULL DEFAULT 'ACTIVE', -- ACTIVE, COMPLETED, CANCELED
start_date TIMESTAMPTZ NOT NULL,
end_date TIMESTAMPTZ,
parent_instance_key BIGINT,
tenant_id TEXT NOT NULL DEFAULT '<default>',
business_id TEXT
);
-- Element instances (flow node instances)
CREATE TABLE element_instances (
key BIGINT PRIMARY KEY,
process_instance_key BIGINT NOT NULL,
process_definition_key BIGINT NOT NULL,
element_id TEXT NOT NULL, -- BPMN element ID
element_type TEXT NOT NULL, -- SERVICE_TASK, EXCLUSIVE_GATEWAY, etc.
state TEXT NOT NULL, -- ACTIVATING, ACTIVATED, COMPLETING, COMPLETED, TERMINATING, TERMINATED
flow_scope_key BIGINT, -- parent element instance
created_at TIMESTAMPTZ NOT NULL,
completed_at TIMESTAMPTZ,
tenant_id TEXT NOT NULL DEFAULT '<default>'
);
CREATE INDEX idx_ei_process ON element_instances(process_instance_key);
CREATE INDEX idx_ei_scope ON element_instances(flow_scope_key) WHERE state NOT IN ('COMPLETED', 'TERMINATED');
-- Variables
CREATE TABLE variables (
key BIGINT PRIMARY KEY,
name TEXT NOT NULL,
value JSONB NOT NULL,
process_instance_key BIGINT NOT NULL,
scope_key BIGINT NOT NULL, -- element instance key
tenant_id TEXT NOT NULL DEFAULT '<default>',
UNIQUE(name, scope_key)
);
CREATE INDEX idx_var_scope ON variables(scope_key);
-- Jobs
CREATE TABLE jobs (
key BIGINT PRIMARY KEY,
type TEXT NOT NULL,
process_instance_key BIGINT NOT NULL,
element_instance_key BIGINT NOT NULL,
element_id TEXT NOT NULL,
state TEXT NOT NULL DEFAULT 'CREATED', -- CREATED, ACTIVATED, COMPLETED, FAILED, ERROR_THROWN, TIMED_OUT
retries INT NOT NULL DEFAULT 3,
worker TEXT,
deadline TIMESTAMPTZ,
backoff_until TIMESTAMPTZ,
custom_headers JSONB,
variables JSONB,
error_message TEXT,
error_code TEXT,
tenant_id TEXT NOT NULL DEFAULT '<default>',
created_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX idx_jobs_activatable ON jobs(type, tenant_id) WHERE state = 'CREATED';
CREATE INDEX idx_jobs_deadline ON jobs(deadline) WHERE state = 'ACTIVATED';
CREATE INDEX idx_jobs_backoff ON jobs(backoff_until) WHERE state = 'FAILED' AND backoff_until IS NOT NULL;
-- Timers
CREATE TABLE timers (
key BIGINT PRIMARY KEY,
process_instance_key BIGINT,
element_instance_key BIGINT,
process_definition_key BIGINT,
element_id TEXT NOT NULL,
due_date TIMESTAMPTZ NOT NULL,
repetitions INT DEFAULT 0,
state TEXT NOT NULL DEFAULT 'ACTIVE',
tenant_id TEXT NOT NULL DEFAULT '<default>'
);
CREATE INDEX idx_timers_due ON timers(due_date) WHERE state = 'ACTIVE';
-- Messages (for correlation)
CREATE TABLE messages (
key BIGINT PRIMARY KEY,
name TEXT NOT NULL,
correlation_key TEXT NOT NULL,
message_id TEXT,
variables JSONB,
time_to_live INTERVAL,
deadline TIMESTAMPTZ,
state TEXT NOT NULL DEFAULT 'PUBLISHED',
tenant_id TEXT NOT NULL DEFAULT '<default>',
UNIQUE(name, correlation_key, message_id) -- dedup
);
CREATE INDEX idx_msg_correlate ON messages(name, correlation_key, tenant_id) WHERE state = 'PUBLISHED';
-- Message subscriptions
CREATE TABLE message_subscriptions (
key BIGINT PRIMARY KEY,
message_name TEXT NOT NULL,
correlation_key TEXT NOT NULL,
process_instance_key BIGINT NOT NULL,
element_instance_key BIGINT NOT NULL,
element_id TEXT NOT NULL,
state TEXT NOT NULL DEFAULT 'ACTIVE',
tenant_id TEXT NOT NULL DEFAULT '<default>'
);
CREATE INDEX idx_msub_correlate ON message_subscriptions(message_name, correlation_key, tenant_id) WHERE state = 'ACTIVE';
-- Incidents
CREATE TABLE incidents (
key BIGINT PRIMARY KEY,
process_instance_key BIGINT NOT NULL,
element_instance_key BIGINT NOT NULL,
job_key BIGINT,
error_type TEXT NOT NULL, -- JOB_NO_RETRIES, UNHANDLED_ERROR_EVENT, EXTRACT_VALUE_ERROR
error_message TEXT,
state TEXT NOT NULL DEFAULT 'ACTIVE',
created_at TIMESTAMPTZ NOT NULL,
resolved_at TIMESTAMPTZ,
tenant_id TEXT NOT NULL DEFAULT '<default>'
);
-- Key generator
CREATE SEQUENCE key_generator START 1;
Diferencias con Camunda¶
- SQL queries directas en vez de exporter + ES
- Sin column families — tablas SQL con índices
- Sin DbCompositeKey — claves naturales o foreign keys
- Sin TransactionContext wrapper — transacciones PostgreSQL nativas
Módulo 3: BPMN Parser & Transformer¶
Lección de Camunda¶
5-step visitor pipeline: instantiation → attributes → context-dependent → container-aware → multi-instance wrapping.
Implementación Simplificada¶
Parsing: usar una library XML existente para parsear BPMN 2.0 XML. Producir un modelo in-memory.
Supported Elements (MVP):
- process (with startEvent, endEvent)
- serviceTask → creates jobs
- userTask → creates user tasks
- exclusiveGateway → condition evaluation
- parallelGateway → fork/join
- subProcess (embedded)
- callActivity
- intermediateCatchEvent (timer, message)
- boundaryEvent (timer, error)
- endEvent (none, error, terminate)
- sequenceFlow with conditions
Transformation (2 pasos en vez de 5):
1. Parse: XML → in-memory model con todos los elementos y sus atributos
2. Link: resolver references (outgoing flows, default flows, error refs), validar, producir ExecutableProcess
Expression Language: subset de FEEL o usar CEL (Common Expression Language): - Literals, comparisons, arithmetic - String interpolation - Context access (variable lookup) - Basic functions (contains, matches, now, date)
Módulo 4: Processing Engine¶
Lección de Camunda¶
BpmnStreamProcessor → BpmnElementProcessors (EnumMap) → 24 processors con 12 behaviors.
Implementación Simplificada¶
Element Processors (12 en vez de 24):
interface ElementProcessor {
onActivate(context: ProcessingContext): Result
onComplete(context: ProcessingContext): Result
onTerminate(context: ProcessingContext): Result
}
interface ContainerProcessor extends ElementProcessor {
onChildCompleted(context, childContext): Result
onChildTerminated(context, childContext): Result
}
| Element | Processor | Notas |
|---|---|---|
| PROCESS | ProcessProcessor | Container root |
| SERVICE_TASK | ServiceTaskProcessor | Creates job on activate |
| USER_TASK | UserTaskProcessor | Creates user task record |
| EXCLUSIVE_GATEWAY | ExclusiveGatewayProcessor | Evaluate conditions, take first true |
| PARALLEL_GATEWAY | ParallelGatewayProcessor | Fork: activate all outgoing. Join: count incoming. |
| SUB_PROCESS | SubProcessProcessor | Container with scoped variables |
| CALL_ACTIVITY | CallActivityProcessor | Creates child process instance |
| START_EVENT | StartEventProcessor | Activates first flow node |
| END_EVENT | EndEventProcessor | Completes flow scope |
| BOUNDARY_EVENT | BoundaryEventProcessor | Timer + error only |
| INTERMEDIATE_CATCH | IntermediateCatchProcessor | Timer + message |
| SEQUENCE_FLOW | SequenceFlowProcessor | Activates target element |
Behaviors (6 en vez de 12):
- VariableBehavior — input/output mapping, scoping
- JobBehavior — create/complete/fail jobs
- IncidentBehavior — create/resolve incidents
- TimerBehavior — schedule/cancel timers
- MessageBehavior — create/correlate message subscriptions
- StateBehavior — state transitions, child tracking
Lifecycle State Machine (mantener el de Camunda):
stateDiagram-v2
[*] --> ACTIVATING
ACTIVATING --> ACTIVATED
ACTIVATED --> COMPLETING
COMPLETING --> COMPLETED
ACTIVATING --> TERMINATING
ACTIVATED --> TERMINATING
COMPLETING --> TERMINATING
TERMINATING --> TERMINATED
COMPLETED --> [*]
TERMINATED --> [*]
Módulo 5: Job Worker System¶
Lección de Camunda¶
gRPC streaming + long polling, timeout, retries, backoff, whitelisted commands.
Implementación Simplificada¶
REST API (compatible con Camunda v2):
POST /v2/jobs/activation
Body: { type, worker, timeout, maxJobsToActivate, fetchVariables, requestTimeout }
Response: { jobs: [...] }
Long polling: if no jobs available, hold connection up to requestTimeout
POST /v2/jobs/{key}/completion
Body: { variables }
POST /v2/jobs/{key}/failure
Body: { retries, retryBackOff, errorMessage }
POST /v2/jobs/{key}/error
Body: { errorCode, errorMessage, variables }
Implementación:
- Job activation: SELECT ... FROM jobs WHERE type = ? AND state = 'CREATED' AND tenant_id IN (?) LIMIT ? FOR UPDATE SKIP LOCKED
- Long polling: PostgreSQL LISTEN/NOTIFY en channel jobs_{type}
- Timeout checker: goroutine/thread periódico que marca jobs expirados como TIMED_OUT
- Backoff: WHERE backoff_until <= now() en el activation query
Módulo 6: Timer System¶
Lección de Camunda¶
DueDateCheckScheduler demand-driven con yielding. 100ms resolución.
Implementación Simplificada¶
Timer Scheduler: un goroutine/thread que:
1. SELECT MIN(due_date) FROM timers WHERE state = 'ACTIVE'
2. Sleep hasta ese timestamp (o max 1 segundo)
3. SELECT ... FROM timers WHERE due_date <= now() AND state = 'ACTIVE' ORDER BY due_date LIMIT 100
4. Para cada timer: generar command (TRIGGER), actualizar timer (si repetitivo, calcular next due date)
5. Yield después de 50 timers para no bloquear el event loop
Cron support: library cron del lenguaje elegido para parsear expresiones cron.
Módulo 7: Message Correlation¶
Lección de Camunda¶
Cross-partition routing, dedup por messageId, TTL, subscription matching.
Implementación Simplificada (single-partition)¶
-- Publish message
INSERT INTO messages (key, name, correlation_key, ...) ...
ON CONFLICT (name, correlation_key, message_id) DO NOTHING; -- dedup
-- Try correlate immediately
SELECT ms.* FROM message_subscriptions ms
WHERE ms.message_name = ? AND ms.correlation_key = ? AND ms.state = 'ACTIVE'
LIMIT 1 FOR UPDATE;
-- If match: activate the waiting element, mark subscription as correlated
-- If no match: message waits in buffer until TTL expires or subscription arrives
-- TTL cleanup
DELETE FROM messages WHERE deadline < now();
Módulo 8: REST API (Compatible con Camunda v2)¶
Endpoints Principales¶
# Process definitions
POST /v2/process-definitions/search
GET /v2/process-definitions/{key}
GET /v2/process-definitions/{key}/xml
DELETE /v2/process-definitions/{key}
# Deployments
POST /v2/deployments (multipart: BPMN files)
# Process instances
POST /v2/process-instances/creation
POST /v2/process-instances/search
GET /v2/process-instances/{key}
DELETE /v2/process-instances/{key} (cancel)
# Jobs
POST /v2/jobs/activation
POST /v2/jobs/{key}/completion
POST /v2/jobs/{key}/failure
POST /v2/jobs/{key}/error
# User tasks
POST /v2/user-tasks/search
GET /v2/user-tasks/{key}
POST /v2/user-tasks/{key}/assignment
DELETE /v2/user-tasks/{key}/assignee
POST /v2/user-tasks/{key}/completion
PATCH /v2/user-tasks/{key}
# Messages
POST /v2/messages/publication
# Incidents
POST /v2/incidents/search
POST /v2/incidents/{key}/resolution
# Variables
POST /v2/process-instances/{key}/variables
POST /v2/variables/search
# Topology (simplified)
GET /v2/topology
Módulo 9: Monitoring (Simplificado)¶
En vez de Operate + Tasklist¶
Opción A — API-only: todas las queries vía REST API. UI como proyecto separado.
Opción B — Dashboard simple: - Single-page app con tabla de process instances (filtro por state, process ID) - Drill-down: ver flow nodes, variables, incidents de una instancia - BPMN diagram viewer con highlighting del estado actual (usar bpmn-js) - Task list integrada
Queries SQL directas reemplazan a Elasticsearch:
-- Process instances with incidents
SELECT pi.*, COUNT(i.key) as incident_count
FROM process_instances pi
LEFT JOIN incidents i ON i.process_instance_key = pi.key AND i.state = 'ACTIVE'
WHERE pi.state = 'ACTIVE'
GROUP BY pi.key;
-- Active tasks
SELECT j.* FROM jobs j
WHERE j.state IN ('CREATED', 'ACTIVATED')
AND j.type = 'user-task'
ORDER BY j.created_at DESC;
Plan de Implementación (Fases)¶
Fase 1 — Core Engine (2-3 semanas con Claude Code)¶
- Schema PostgreSQL + migraciones
- BPMN parser (XML → modelo in-memory)
- Key generator
- Command processor (single-threaded loop)
- Element processors: Process, StartEvent, EndEvent, ServiceTask, SequenceFlow, ExclusiveGateway
- Variable store + scoping
- Job system: create, activate, complete, fail
- REST API: deploy, create instance, jobs
Fase 2 — Resilience (1-2 semanas)¶
- Timer system (boundary + intermediate catch)
- Incident management
- Job retries + backoff
- Message correlation
- Parallel gateway (fork + join)
- Sub-process + call activity
Fase 3 — Completeness (1-2 semanas)¶
- User tasks (assignment, completion, forms API)
- Error events (throw/catch)
- Expression language (CEL o FEEL subset)
- Multi-instance (parallel)
- Boundary events (timer, error)
Fase 4 — Production Readiness (1-2 semanas)¶
- Authentication (API keys + OIDC)
- Basic authorization (roles)
- Multi-tenancy (tenant_id filtering)
- Monitoring dashboard
- Metrics (OpenTelemetry)
- Backup/restore
Fase 5 — Scale (cuando se necesite)¶
- Clustering (PostgreSQL streaming replication o Raft)
- Partitioning
- Exporter system (para analytics)
- Advanced BPMN elements
Problemas No-Obvios que Camunda Ya Resolvió¶
Estos son los problemas que encontrarás durante la implementación y que Camunda ya tiene solución:
-
Parallel gateway join con terminación parcial: cuando una rama se termina (por boundary event o error) antes de llegar al join. Camunda maneja esto con counters en el event scope — necesitas decrementar el expected count.
-
Variable scope shadowing en sub-processes: variables con el mismo nombre en scopes anidados. El scope más interno gana para lectura, pero ¿dónde escribir en job completion? Camunda: al primer scope donde existe, o al root si es nueva.
-
Timer rescheduling drift: si un timer repetitivo se ancla al momento de disparo, acumula drift. Camunda lo ancla a la due date anterior.
-
Message dedup con TTL: un mensaje puede expirar entre el publish y la correlación. Necesitas un check de deadline en el momento de correlación, no solo en el cleanup.
-
Incident resolution re-processing: al resolver un incident, necesitas re-ejecutar EXACTAMENTE el comando que falló. Camunda reconstruye el comando desde el estado del element instance.
-
Process deployment duplicate detection: si el mismo BPMN se deploya dos veces, no crear versión nueva. Camunda usa checksum del contenido.
-
Job timeout race condition: un worker completa un job justo cuando el timeout lo marca como TIMED_OUT. Necesitas
FOR UPDATE SKIP LOCKEDo similar. -
Cross-scope compensation: compensation handlers en sub-processes necesitan acceso a variables del scope donde se creó la subscription, no del scope actual.
-
Event sub-process interruption: un interrupting event sub-process debe terminar TODOS los elementos activos en el parent scope antes de activarse. Orden de terminación importa.
-
Call activity variable isolation: variables del parent NO se propagan automáticamente al child. Solo las que están en input mappings. Esto es intencional para encapsulación.