Disaster recovery

GoRules separates rule management (BRMS) from rule execution (Agent/SDK). This decoupled architecture simplifies disaster recovery — your production workloads continue even if the BRMS is temporarily unavailable.

Recovery priorities

Component	Impact if unavailable	Priority
Execution layer (Agent or SDK)	Applications can’t evaluate rules	High
Object Storage	Agents can’t load new rules (existing rules remain in memory)	High
BRMS	Can’t author or publish new rules	Medium
PostgreSQL	BRMS unavailable	Medium

Management layer (BRMS)

Because the management layer is decoupled from execution, the BRMS does not require maximum availability. A straightforward setup is sufficient: Horizontal scaling — Run at least 2 BRMS replicas behind a load balancer for failover. Database HA — Use managed PostgreSQL with high availability enabled:

AWS Aurora PostgreSQL (Multi-AZ)
Azure Database for PostgreSQL Flexible Server (HA mode)
Google Cloud SQL (High availability)

Backups — Configure automated backups with point-in-time recovery. This is standard in modern managed database services and protects against data corruption or accidental deletion.

If BRMS becomes unavailable, rule execution continues uninterrupted. Users cannot author or publish new rules until BRMS is restored, but all existing rules remain operational.

Execution layer

The execution layer requires high availability since it handles live traffic.

Agent deployment

When using the Agent, the only external dependency is object storage — which cloud providers design for high availability across multiple availability zones. Run at least 2 Agent replicas in production with health checks. Spread replicas across availability zones when possible. Agent resilience — Once rules are loaded into memory, the Agent continues serving requests even if object storage becomes temporarily unavailable. Rules are not ejected automatically on storage failure.

SDK deployment

When using the SDK with bundled rules, there are no external dependencies — your application is self-contained. High availability depends entirely on how you deploy your service. Run with horizontal scaling and standard HA patterns for your platform.

Cross-region availability

For extreme availability requirements, you can deploy across multiple regions without running BRMS in every region. How it works:

BRMS runs in a single region and publishes releases to object storage
Object storage replicates to a secondary region (using native cloud replication)
Agents in each region poll their local storage bucket
When BRMS publishes a release, both regions receive the same rules automatically

This approach provides regional failover for rule execution while keeping the management layer simple.

Recovery procedures

Agent failure

Traffic automatically routes to healthy replicas via your load balancer. Replace failed instances and investigate root cause from logs.

Storage failure

Agents continue serving with rules already in memory. If using cross-region replication, update Agent configuration to use the replica bucket. Restore primary storage when possible.

BRMS failure

Rule execution continues unaffected. Redeploy BRMS containers and verify database connectivity. Users can resume authoring once restored.

Recovery objectives

Metric	Typical target
RTO (Recovery Time Objective)	Near-zero with multi-replica Agents
RPO (Recovery Point Objective)	0 with storage replication

With replicated storage and multi-replica Agents, most failures are handled automatically without downtime.

Overview

AI & Tools

Deployment

Platform Guides

SDKs

Big Data

JDM

Recovery priorities

Management layer (BRMS)

Execution layer

Agent deployment

SDK deployment

Cross-region availability

Recovery procedures

Agent failure

Storage failure

BRMS failure

Recovery objectives

Overview

AI & Tools

Deployment

Platform Guides

SDKs

Big Data

JDM

​Recovery priorities

​Management layer (BRMS)

​Execution layer

​Agent deployment

​SDK deployment

​Cross-region availability

​Recovery procedures

​Agent failure

​Storage failure

​BRMS failure

​Recovery objectives

Recovery priorities

Management layer (BRMS)

Execution layer

Agent deployment

SDK deployment

Cross-region availability

Recovery procedures

Agent failure

Storage failure

BRMS failure

Recovery objectives