ADR-017: JSON serialization¶

Status: Accepted
Date: 2026-05-14
Tags: data, serialization, performance

Context and Problem Statement¶

Camunda usa SBE (10-50 ns encode/decode) para internal protocol + MessagePack para variables (50-200 ns). Performance extrema pero requiere schemas custom y NO es human-readable. ¿Qué usa el MVP?

Decision Drivers¶

MVP performance target es single-digit ms (no nanoseconds)
Postgres roundtrip domina serialization cost
JSON es debuggable (puede leerse en logs, queries SQL, etc.)
JSONB de Postgres es optimizado
Ecosystem JSON es masivo (tools, libraries, IDE support)

Considered Options¶

JSON únicamente (text + JSONB en Postgres)
MessagePack (compact, binary, self-describing)
Protobuf (binary con schema)
SBE (ultra-fast pero complejo)
Mixed: JSON exterior, binary interno

Decision Outcome¶

Chosen option: JSON únicamente porque: - Performance suficiente para target MVP - Debuggability tier 1 - Postgres JSONB es eficiente - Ecosystem completo - Reading event_log SQL trivial

Positive Consequences¶

Logs son legibles
SQL queries directos sobre payloads
Tools standard (jq, IDE support)
Onboarding trivial
Migration desde Camunda traceable (Camunda usa JSON external también)

Negative Consequences¶

~10-100x más lento que SBE
Storage size mayor (~2-5x vs SBE)
JSON parsing overhead per request
Si performance crítica, requires change

Performance reality check¶

Benchmarks (de sbe serialization):

Format	Encode	Decode	Self-describing
SBE	10-50 ns	10-50 ns	NO
Protobuf	100-500 ns	100-500 ns	NO
MessagePack	50-200 ns	50-200 ns	YES
JSON	1-10 μs	1-10 μs	YES

JSON es 100-1000x más lento que SBE. Pero:

Total request time = Network + Parse + Process + DB + Response

Para MVP:
  Network: 1-5 ms
  Parse (JSON): 0.1-1 ms     ← optimization aquí: ~1-5%
  Process: 1-10 ms
  DB: 5-50 ms                ← dominante
  Response: 1-5 ms

Total: ~10-70 ms

Optimizing JSON parsing from 1ms a 0.01ms (1000x) → saves ~1% of total. No worth la complexity de SBE.

Storage en Postgres¶

-- JSONB es eficiente
CREATE TABLE event_log (
    position BIGSERIAL PRIMARY KEY,
    intent TEXT NOT NULL,
    payload JSONB NOT NULL,
    timestamp TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes en campos específicos del JSON
CREATE INDEX idx_event_pi ON event_log ((payload->>'processInstanceKey'));

-- Queries directos
SELECT payload->>'processInstanceKey' AS pid, 
       payload->>'elementId' AS element,
       intent
FROM event_log
WHERE payload->>'processInstanceKey' = '12345'
ORDER BY position;

JSONB internal storage está optimized — no es text raw.

Variables del proceso¶

// process variable serialized
{
  "customerId": "CUST-001",
  "orderItems": [
    { "sku": "WIDGET-A", "qty": 2, "price": 19.99 },
    { "sku": "GADGET-B", "qty": 1, "price": 49.99 }
  ],
  "shippingAddress": {
    "street": "123 Main St",
    "city": "Springfield"
  }
}

vs MessagePack del mismo (smaller pero illegible):

86 a a c u s t o m e r I d ... <binary> ...

JSON wins en developer experience.

Limite de variables¶

Recordar intuit production benchmarks: variables > 100-150 KB ralentizan exports. Enforce limit:

ALTER TABLE variables
ADD CONSTRAINT max_variable_size 
CHECK (octet_length(value::text) <= 102400);  -- 100 KB

Si user necesita store >100KB, externalize:

{
  "documentRef": "s3://bucket/large-doc.pdf"
}

NO store el PDF in variables.

Cuándo reconsider¶

Switch to binary format SI:

Throughput necesario > 50K TPS
Profiler muestra JSON parsing > 20% del time
Storage cost se vuelve issue ($$$ en database)
Network bandwidth bottleneck

Para 99% de casos, JSON wins. Premature optimization is the root of evil.

Migration path (si needed)¶

Cuando si needed switch:

# Engine maneja both formats con header
if record.format == 'json':
    payload = json.loads(record.bytes)
elif record.format == 'msgpack':
    payload = msgpack.unpackb(record.bytes)

# Newly written records usan format actual
new_record = Record(
    format='msgpack',  # eventually
    bytes=msgpack.packb(data)
)

Lazy migration: old records JSON, new MessagePack. Engine handles both.

Links¶

sbe serialization — SBE detalle (para contraste)
intuit production benchmarks — Variable size limits
JSONB Postgres docs