Saltar a contenido

ADR-017: JSON serialization

  • Status: Accepted
  • Date: 2026-05-14
  • Tags: data, serialization, performance

Context and Problem Statement

Camunda usa SBE (10-50 ns encode/decode) para internal protocol + MessagePack para variables (50-200 ns). Performance extrema pero requiere schemas custom y NO es human-readable. ¿Qué usa el MVP?

Decision Drivers

  • MVP performance target es single-digit ms (no nanoseconds)
  • Postgres roundtrip domina serialization cost
  • JSON es debuggable (puede leerse en logs, queries SQL, etc.)
  • JSONB de Postgres es optimizado
  • Ecosystem JSON es masivo (tools, libraries, IDE support)

Considered Options

  1. JSON únicamente (text + JSONB en Postgres)
  2. MessagePack (compact, binary, self-describing)
  3. Protobuf (binary con schema)
  4. SBE (ultra-fast pero complejo)
  5. Mixed: JSON exterior, binary interno

Decision Outcome

Chosen option: JSON únicamente porque: - Performance suficiente para target MVP - Debuggability tier 1 - Postgres JSONB es eficiente - Ecosystem completo - Reading event_log SQL trivial

Positive Consequences

  • Logs son legibles
  • SQL queries directos sobre payloads
  • Tools standard (jq, IDE support)
  • Onboarding trivial
  • Migration desde Camunda traceable (Camunda usa JSON external también)

Negative Consequences

  • ~10-100x más lento que SBE
  • Storage size mayor (~2-5x vs SBE)
  • JSON parsing overhead per request
  • Si performance crítica, requires change

Performance reality check

Benchmarks (de concepts/sbe-serialization):

Format Encode Decode Self-describing
SBE 10-50 ns 10-50 ns NO
Protobuf 100-500 ns 100-500 ns NO
MessagePack 50-200 ns 50-200 ns YES
JSON 1-10 μs 1-10 μs YES

JSON es 100-1000x más lento que SBE. Pero:

Total request time = Network + Parse + Process + DB + Response

Para MVP:
  Network: 1-5 ms
  Parse (JSON): 0.1-1 ms     ← optimization aquí: ~1-5%
  Process: 1-10 ms
  DB: 5-50 ms                ← dominante
  Response: 1-5 ms

Total: ~10-70 ms

Optimizing JSON parsing from 1ms a 0.01ms (1000x) → saves ~1% of total. No worth la complexity de SBE.

Storage en Postgres

-- JSONB es eficiente
CREATE TABLE event_log (
    position BIGSERIAL PRIMARY KEY,
    intent TEXT NOT NULL,
    payload JSONB NOT NULL,
    timestamp TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes en campos específicos del JSON
CREATE INDEX idx_event_pi ON event_log ((payload->>'processInstanceKey'));

-- Queries directos
SELECT payload->>'processInstanceKey' AS pid, 
       payload->>'elementId' AS element,
       intent
FROM event_log
WHERE payload->>'processInstanceKey' = '12345'
ORDER BY position;

JSONB internal storage está optimized — no es text raw.

Variables del proceso

// process variable serialized
{
  "customerId": "CUST-001",
  "orderItems": [
    { "sku": "WIDGET-A", "qty": 2, "price": 19.99 },
    { "sku": "GADGET-B", "qty": 1, "price": 49.99 }
  ],
  "shippingAddress": {
    "street": "123 Main St",
    "city": "Springfield"
  }
}

vs MessagePack del mismo (smaller pero illegible):

86 a a c u s t o m e r I d ... <binary> ...

JSON wins en developer experience.

Limite de variables

Recordar analysis/intuit-production-benchmarks: variables > 100-150 KB ralentizan exports. Enforce limit:

ALTER TABLE variables
ADD CONSTRAINT max_variable_size 
CHECK (octet_length(value::text) <= 102400);  -- 100 KB

Si user necesita store >100KB, externalize:

{
  "documentRef": "s3://bucket/large-doc.pdf"
}

NO store el PDF in variables.

Cuándo reconsider

Switch to binary format SI:

  1. Throughput necesario > 50K TPS
  2. Profiler muestra JSON parsing > 20% del time
  3. Storage cost se vuelve issue ($$$ en database)
  4. Network bandwidth bottleneck

Para 99% de casos, JSON wins. Premature optimization is the root of evil.

Migration path (si needed)

Cuando si needed switch:

# Engine maneja both formats con header
if record.format == 'json':
    payload = json.loads(record.bytes)
elif record.format == 'msgpack':
    payload = msgpack.unpackb(record.bytes)

# Newly written records usan format actual
new_record = Record(
    format='msgpack',  # eventually
    bytes=msgpack.packb(data)
)

Lazy migration: old records JSON, new MessagePack. Engine handles both.