Saltar a contenido

ADR-020: Patroni para Postgres HA

  • Status: Accepted
  • Date: 2026-05-14
  • Tags: infrastructure, ha, postgres, phase-1

Context and Problem Statement

Phase 0 del MVP es single-node Postgres. Phase 1 requiere HA para SLA > 99%. ¿Qué tool de HA usamos?

Decision Drivers

  • Patroni es maduro (battle-tested en Consul, Vault, Stack Overflow, etc.)
  • Failover automático en ~30s
  • Streaming replication built-in de Postgres
  • etcd/Consul backend para coordination

Considered Options

  1. Patroni — orchestration layer encima de Postgres
  2. Stolon — alternative orchestrator
  3. repmgr — replication manager (older)
  4. pg_auto_failover — Microsoft-developed
  5. Cloud managed (RDS Multi-AZ, Cloud SQL HA)
  6. Custom failover scripting

Decision Outcome

Chosen option: Patroni para self-hosted, cloud managed HA para SaaS deployments.

Positive Consequences

  • Failover automático (no human intervention)
  • Battle-tested at scale
  • etcd como DCS (distributed config store) standard
  • Excellent docs
  • Active community
  • CLI tooling maduro (patronictl)

Negative Consequences

  • Operational complexity (3+ Postgres nodes + etcd cluster)
  • etcd requires its own ops
  • Failover ~30s (slower than Raft theoretical <1s)
  • Split-brain risks if etcd partitioned

Architecture

flowchart TD
    PGB["PgBouncer<br/>(connection pool)"]
    subgraph PGCluster[Postgres cluster]
        PRIM["PG Primary<br/>(writes)"]
        REP1["PG Replica 1<br/>(hot standby)"]
        REP2["PG Replica 2<br/>(hot standby)"]
    end
    subgraph ETCD["etcd cluster (3 nodes)"]
        E["• Leader election state<br/>• Cluster configuration"]
    end
    PGB -->|"writes → primary"| PRIM
    PGB -->|"reads → any"| REP1
    PGB -->|"reads → any"| REP2
    PRIM <-->|repl| REP1
    PRIM -->|repl| REP2
    PRIM -.->|Patroni agent| ETCD
    REP1 -.->|Patroni agent| ETCD
    REP2 -.->|Patroni agent| ETCD

Configuration ejemplo

# /etc/patroni/patroni.yml
scope: workflow-engine
namespace: /service/
name: pg-node-1

restapi:
  listen: 0.0.0.0:8008
  connect_address: pg-node-1.internal:8008

etcd3:
  hosts:
    - etcd-1.internal:2379
    - etcd-2.internal:2379
    - etcd-3.internal:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576  # 1MB max lag for failover candidate
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 200
        max_wal_size: 4GB
        wal_level: replica
        hot_standby: "on"
        max_wal_senders: 10
        max_replication_slots: 10
        wal_keep_size: 1GB
        archive_mode: "on"
        archive_command: "pgbackrest --stanza=main archive-push %p"

  initdb:
    - encoding: UTF8
    - data-checksums  # critical for integrity detection

postgresql:
  listen: 0.0.0.0:5432
  connect_address: pg-node-1.internal:5432
  data_dir: /var/lib/postgresql/15/main
  bin_dir: /usr/lib/postgresql/15/bin
  authentication:
    replication:
      username: replicator
      password: ${REPLICATOR_PASSWORD}
    superuser:
      username: postgres
      password: ${POSTGRES_PASSWORD}

Operational procedures

Verificar cluster

patronictl -c /etc/patroni/patroni.yml list

# Output:
# + Cluster: workflow-engine -+---------+--------+
# | Member    | Host             | Role    | State  |
# +-----------+------------------+---------+--------+
# | pg-node-1 | pg-node-1.internal | Leader  | running|
# | pg-node-2 | pg-node-2.internal | Replica | running|
# | pg-node-3 | pg-node-3.internal | Replica | running|
# +-----------+------------------+---------+--------+

Manual failover (planned maintenance)

patronictl -c /etc/patroni/patroni.yml failover --candidate pg-node-2

Automatic failover (unplanned)

Si primary dies: 1. Patroni en replicas detecta (TTL 30s) 2. Leader election via etcd 3. Highest-LSN replica promoted 4. PgBouncer reconnects to new primary 5. Application transparent (con retry logic)

Total time: ~30s.

App-side considerations

Connection string usa PgBouncer (no direct PG):

postgresql://app:password@pgbouncer.internal:6432/workflow

PgBouncer config:

[databases]
workflow = host=pg-primary.internal port=5432

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

pg-primary.internal es DNS que Patroni mantiene apuntando al primary actual.

Retry logic en app

async def execute_with_retry(query, max_retries=3):
    for attempt in range(max_retries):
        try:
            return await pool.execute(query)
        except (ConnectionError, OperationalError):
            if attempt == max_retries - 1:
                raise
            await asyncio.sleep(2 ** attempt)  # exponential backoff

Durante failover (~30s), retries cubren la transition.

Backup integration

Patroni + pgBackRest:

postgresql:
  parameters:
    archive_mode: "on"
    archive_command: "pgbackrest --stanza=main archive-push %p"

pgBackRest config:

# /etc/pgbackrest/pgbackrest.conf
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=7
repo1-type=s3
repo1-s3-bucket=my-pg-backups
repo1-s3-region=us-east-1
repo1-cipher-type=aes-256-cbc

[main]
pg1-path=/var/lib/postgresql/15/main

Full backup daily + WAL streaming continuous = RPO ~5 minutes.

Failure modes

Failure Behavior RTO RPO
Primary dies Patroni promotes replica ~30s ~5s (WAL lag)
Replica dies No impact (still 1 replica + primary) 0s 0
etcd quorum lost Patroni read-only mode (safety) Until restored 0
Network partition primary Self-fence, replica promotes ~30s ~5s
Disk full primary Patroni alerts, failover manual manual depends
Disk corruption Detected by data-checksums Restore from backup depends

Cuándo usar cloud managed en su lugar

Para SaaS deployments:

Cloud Service Trade-offs
AWS RDS Multi-AZ Easy, expensive
GCP Cloud SQL HA Easy, expensive
Azure Database for PostgreSQL Flexible Server Easy, expensive

Trade-offs: - Pro: zero ops, managed - Con: 2-3x cost vs self-hosted Patroni - Con: vendor lock-in - Con: less control (no custom Postgres extensions sometimes)

Recomendación: - MVP early stage: cloud managed (focus on product) - Scale up: self-hosted Patroni (cost savings significant)