Atomix Fork Lessons

Camunda mantiene un hard fork de Atomix (Raft + SWIM + transport) desde antes de 2020. El upstream pivotó a Go, los PRs no se mergeaban, así que Camunda redujo el codebase ~50% y eventualmente mergeó el fork al monorepo principal. Costo: build complexity, snapshot version issues, mayor responsabilidad de mantenimiento. Lección clave para el MVP: NO forkear protocolos de consensus — usar library existente (hashicorp/raft, etcd/raft) o evitar consensus completamente.

La historia (del README oficial)¶

Inicio: Atomix como dependency¶

Camunda empezó usando Atomix como una dependencia Maven normal. La promesa de Atomix era: framework Java para distributed systems con Raft, SWIM, distributed primitives.

Problemas iniciales¶

"Very quick we found some issues which we needed to fix, so we created PR's against the base repository. In the corresponding base repository was not much activity, which means it took a while to merge the first PR's."

Upstream tenía baja actividad → PRs lentos a mergear.

Decisión 1: Fork propio¶

"We started to merge our PR's in our fork and released our own versions, such that we can make still progress."

Cuando upstream no respondía suficientemente rápido, Camunda creó su propio fork con releases custom.

El pivot upstream¶

"We saw that during working with atomix and fixing more and more bugs, that our PR's haven't been merged upstream. Also we found out that they moved away from the original code base and rewrote everything in GO."

Upstream pivotó completamente a Go — el código Java quedó como legacy. El fork se volvió permanente por necesidad.

Pain points operacionales¶

"We always had the problem that when we fixed or changed something in atomix we needed to release a new version to use it in Zeebe. Sometimes it happens that we just released the newest version of atomix on the day we wanted to released Zeebe, which sometimes broke the build."

Workflow: 1. Fix bug en Atomix 2. Release Atomix versión X 3. Update dependency en Zeebe 4. Run integration tests 5. Si fallan, ir a step 1 6. Eventually release Zeebe

Cada step añade fricción. Same-day releases rompiendo Zeebe builds.

Snapshot versions intentadas¶

"We switched to using snapshot versions, which improved this a bit. But if we then changed something it could happen that we broke develop and other branches in the Zeebe Repo."

Snapshot versions reducen friction pero introducen inestabilidad. Cambios en Atomix snapshot pueden romper develop branch de Zeebe sin warning.

Decisión final: merge al monorepo¶

"In order to avoid broken builds (develop etc.) and improve the development cycle we decided to merge the Atomix repo into ours."

Mergear todo el código de Atomix al repo de Zeebe.

Pros y cons documentados¶

Pros (del README)¶

Pro	Significado
Single build	No breaking otras branches
Shorter dev cycle	Test changes directamente
Easier load tests	Single repo, single build
More test runs	Atomix tests corren más
Tools consistency	LGTM, sonarcloud, etc. uniformes
Easier releases	Single artifact stream

Cons (del README)¶

Con	Significado
Flaky tests initially	Tests heredados no eran reliable
Longer build time	Más código a compilar/test

(Implícitamente:) - Sin upstream community para support - Toda responsabilidad de bug fixing es interna - Sin pull de fixes upstream

Lo que removieron¶

"We removed half of their code base, because we don't need it."

50%+ reduction del codebase original. Esto refleja: - Atomix tenía muchas features que Camunda no necesita (distributed primitives genéricos) - Camunda solo necesitaba: Raft + SWIM + transport - Menos código = menos surface area = menos bugs

Estructura final¶

zeebe/atomix/
├── README.md
├── cluster/
│   └── src/main/java/io/atomix/
│       ├── cluster/       ← Cluster membership
│       ├── primitive/     ← Distributed primitives (minimal)
│       └── raft/          ← Raft implementation
└── utils/

Comparado con Atomix original (mucho más amplio), el fork de Camunda es focused y minimal.

Patterns valiosos del fork¶

Listeners pattern¶

RaftRoleChangeListener
RaftCommitListener
RaftApplicationEntryCommittedPositionListener
SnapshotReplicationListener

Plug points para integration con application logic. Cuando rol cambia, cuando commits avanzan, cuando snapshots se reciben — listeners notifican sin coupling tight.

Esto es excelente design y replicable en el MVP.

Roles state machine¶

stateDiagram-v2
    [*] --> Inactive
    Inactive --> Follower
    Follower --> Candidate
    Candidate --> Leader
    Leader --> Follower

Estados como classes separadas: - roles/Inactive.java - roles/Follower.java - roles/Candidate.java - roles/Leader.java

Cada rol tiene su set of handlers. State machine clara. Replicable.

Copyright preservation¶

"Please do not touch the copyright headers, we need to keep their copyright on their files. On new files we create, we will have our License headers with out copyright."

Respeto a authorship original mientras se mantiene fork. Buena práctica legal y ética.

Recomendaciones para el MVP¶

Opción 1: SKIP consensus (RECOMENDADO)¶

Single-node MVP. Postgres handles durability. No Raft, no SWIM, no fork needed.

Pros: - Cero código de consensus - Operacionalmente trivial - Sufficient para 99% de casos

Cons: - No HA automático - Failover requires manual intervention

Opción 2: Use library existente¶

Si eventualmente necesitas HA:

Library	Language	Maturity
`hashicorp/raft`	Go	⭐⭐⭐⭐⭐ (Consul, Vault)
`etcd-io/raft`	Go	⭐⭐⭐⭐⭐ (Kubernetes, etcd)
`tikv/raft-rs`	Rust	⭐⭐⭐⭐ (TiKV production)
`MicroRaft`	Java	⭐⭐⭐ (less battle-tested)
`scalecube-cluster`	Java	⭐⭐ (smaller community)

hashicorp/raft o etcd/raft son las elecciones obvias. Battle-tested at scale. Upstream activo. Bug fixes regulares.

Opción 3: External coordination¶

Delegar a Consul, etcd, ZooKeeper. Engine MVP no implementa consensus, solo lo consume.

flowchart LR
    Engine[Engine MVP node]
    Engine -->|Reads/writes state| PG[(Postgres)]
    Engine -->|Acquires leader lock| Coord[Consul/etcd session]

Trade-off: dependencia operacional adicional, pero zero consensus code en el engine.

Opción 4: Fork (NO RECOMENDADO)¶

Lo que Camunda hizo. Los costos están documentados en este análisis: - Build complexity - Maintenance burden - No upstream support - Test flakiness

Solo considerar si: - Tienes 2+ engineers full-time en consensus - Requirements muy custom - Investment sostenido durante años

Lecciones generales (no solo MVP)¶

Forks de critical infrastructure son tentadores pero costosos
Si fork inevitable, mergear al monorepo elimina build complexity
Libraries maduras > custom implementations casi siempre
Documentar la decisión — Camunda hizo bien al publicar el README explicando el "por qué"
Respect upstream authorship incluso en hard forks
Listeners pattern para integration es excelente design
Reduce surface area — remover lo que no usas

Conclusión¶

El fork de Atomix por Camunda es una historia de necesidad + costo. Necesidad porque upstream no era responsive y pivotó a Go. Costo porque mantener un fork de Raft requiere expertise sostenida.

Para el MVP, evitar este path. Single-node + Postgres replication cubre la mayoría de casos. Si HA real necesario, usar library existente. Forkear consensus protocols es decisión de empresa madura con engineering investment dedicado — no para MVPs.

Esta historia es una de las lecciones más valiosas del análisis arquitectónico de Camunda: el costo real de owning protocolos de bajo nivel. Heredarlos cuando se puede, fork solo cuando absolutamente necesario.