Skip to main content
GoRules separates rule management (BRMS) from rule execution (Agent/SDK). This decoupled architecture simplifies disaster recovery — your production workloads continue even if the BRMS is temporarily unavailable.

Recovery priorities

ComponentImpact if unavailablePriority
Execution layer (Agent or SDK)Applications can’t evaluate rulesHigh
Object StorageAgents can’t load new rules (existing rules remain in memory)High
BRMSCan’t author or publish new rulesMedium
PostgreSQLBRMS unavailableMedium

Management layer (BRMS)

Because the management layer is decoupled from execution, the BRMS does not require maximum availability. A straightforward setup is sufficient: Horizontal scaling — Run at least 2 BRMS replicas behind a load balancer for failover. Database HA — Use managed PostgreSQL with high availability enabled:
  • AWS Aurora PostgreSQL (Multi-AZ)
  • Azure Database for PostgreSQL Flexible Server (HA mode)
  • Google Cloud SQL (High availability)
Backups — Configure automated backups with point-in-time recovery. This is standard in modern managed database services and protects against data corruption or accidental deletion.
If BRMS becomes unavailable, rule execution continues uninterrupted. Users cannot author or publish new rules until BRMS is restored, but all existing rules remain operational.

Execution layer

The execution layer requires high availability since it handles live traffic.

Agent deployment

When using the Agent, the only external dependency is object storage — which cloud providers design for high availability across multiple availability zones. Run at least 2 Agent replicas in production with health checks. Spread replicas across availability zones when possible. Agent resilience — Once rules are loaded into memory, the Agent continues serving requests even if object storage becomes temporarily unavailable. Rules are not ejected automatically on storage failure.

SDK deployment

When using the SDK with bundled rules, there are no external dependencies — your application is self-contained. High availability depends entirely on how you deploy your service. Run with horizontal scaling and standard HA patterns for your platform.

Cross-region availability

For extreme availability requirements, you can deploy across multiple regions without running BRMS in every region. How it works:
  1. BRMS runs in a single region and publishes releases to object storage
  2. Object storage replicates to a secondary region (using native cloud replication)
  3. Agents in each region poll their local storage bucket
  4. When BRMS publishes a release, both regions receive the same rules automatically
This approach provides regional failover for rule execution while keeping the management layer simple.

Recovery procedures

Agent failure

Traffic automatically routes to healthy replicas via your load balancer. Replace failed instances and investigate root cause from logs.

Storage failure

Agents continue serving with rules already in memory. If using cross-region replication, update Agent configuration to use the replica bucket. Restore primary storage when possible.

BRMS failure

Rule execution continues unaffected. Redeploy BRMS containers and verify database connectivity. Users can resume authoring once restored.

Recovery objectives

MetricTypical target
RTO (Recovery Time Objective)Near-zero with multi-replica Agents
RPO (Recovery Point Objective)0 with storage replication
With replicated storage and multi-replica Agents, most failures are handled automatically without downtime.