Multi-Region Deployment
InfraSage can be deployed across multiple regions or clouds. This page covers three patterns: independent per-region deployments, a hub-and-spoke federation, and active-active with cross-region replication.
When You Need Multi-Region
- Data residency: Different regions have different regulatory constraints (EU, India, US). You need telemetry to stay within each region's legal boundary.
- Latency: Ingestion latency to a central region is unacceptable for services in distant regions.
- High availability: You want InfraSage to survive a full region failure.
- Scale: A single region cannot handle your total event volume.
Pattern 1: Independent Per-Region Deployments (Recommended)
Each region runs a fully independent InfraSage stack. No telemetry crosses region boundaries. Best for data residency requirements.
EU-WEST-1 VPC AP-SOUTH-1 VPC
┌──────────────────────┐ ┌──────────────────────┐
│ InfraSage (EU) │ │ InfraSage (India) │
│ - Ingestion Gateway │ │ - Ingestion Gateway │
│ - Telemetry Operator │ │ - Telemetry Operator │
│ - AIops Engine │ │ - AIops Engine │
│ - ClickHouse (EU) │ │ - ClickHouse (IN) │
│ - Kafka │ │ - Kafka │
└──────────────────────┘ └──────────────────────┘
│ │
▼ ▼
EU services only India services only
Setup
Deploy the Helm chart once per region. Use separate values.yaml files:
# values-eu.yaml
ingestionGateway:
replicaCount: 3
env:
TENANT_ID: acme-eu
REGION: eu-west-1
clickhouse:
persistence:
storageClass: gp3-eu-west-1
size: 500Gi
helm install infrasage ./infrasage-chart \
-n infrasage \
-f values-eu.yaml \
--kube-context eu-west-1-cluster
helm install infrasage ./infrasage-chart \
-n infrasage \
-f values-india.yaml \
--kube-context ap-south-1-cluster
Each deployment gets its own API keys and tenant configuration. Your services in each region point to their local Ingestion Gateway.
Trade-offs
| ✅ Complete data residency | ❌ No unified cross-region view |
| ✅ Region failure is isolated | ❌ Separate admin per region |
| ✅ Simplest to operate | ❌ Anomalies are not correlated cross-region |
Pattern 2: Hub-and-Spoke Federation
Regional InfraSage instances handle ingestion and local anomaly detection. A central "hub" instance aggregates alerts and provides a unified view. Only alert metadata (no raw telemetry) is forwarded to the hub.
HUB (Central)
┌───────────────┐
│ AIops Engine │
│ Admin UI │
│ (alerts only)│
└───────┬───────┘
│ webhook (alerts only)
┌────────────────┴────────────────┐
│ │
┌───────┴──────┐ ┌────────┴─────┐
│ Spoke: EU │ │ Spoke: India │
│ Full stack │ │ Full stack │
└──────────────┘ └──────────────┘
Hub Configuration
The hub receives anomaly webhooks from each spoke's Alertmanager integration:
# Hub alertmanager config (receives from spokes)
receivers:
- name: infrasage-hub
webhook_configs:
- url: http://infrasage-aiops.hub.internal/api/v1/alerts/webhook
send_resolved: true
Each spoke's AIops Engine is configured to forward alerts:
# Spoke environment variable
ALERT_FORWARD_URL=http://infrasage-hub.central.internal/api/v1/alerts/webhook
ALERT_FORWARD_INCLUDE_METADATA=true
ALERT_FORWARD_INCLUDE_RAW_TELEMETRY=false # Raw telemetry stays in region
Trade-offs
| ✅ Unified alert view across regions | ❌ More complex to configure |
| ✅ Cross-region incident correlation | ❌ Hub is a single point of failure for the unified view |
| ✅ Raw telemetry stays in region | ❌ Alert metadata crosses region boundary |
Pattern 3: Active-Active with ClickHouse Replication
For HA scenarios where you want InfraSage itself to survive a region failure, use ClickHouse Keeper (replicated ClickHouse) across regions with ZooKeeper or ClickHouse Keeper for coordination.
This pattern is operationally complex and requires ClickHouse expertise. Only use it if your SLA requires InfraSage itself to have cross-region HA. For most teams, Pattern 1 with a per-region stack is simpler and sufficient.
ClickHouse Replication Topology
Region A Region B
┌────────────────┐ ┌────────────────┐
│ ClickHouse │◄──replicate─►│ ClickHouse │
│ (shard 1, │ │ (shard 1, │
│ replica 1) │ │ replica 2) │
└────────────────┘ └────────────────┘
│ │
└──────────── reads ───────────┘
(either region)
Key ClickHouse config for cross-region replication:
<!-- config.xml -->
<remote_servers>
<infrasage_cluster>
<shard>
<replica>
<host>clickhouse-a.region-a.internal</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-b.region-b.internal</host>
<port>9000</port>
</replica>
</shard>
</infrasage_cluster>
</remote_servers>
Set in InfraSage:
CLICKHOUSE_CLUSTER=infrasage_cluster
CLICKHOUSE_REPLICATION_ENABLED=true
Kubernetes Multi-Cluster Considerations
Service Discovery
If your services span clusters, point each service to the Ingestion Gateway in its own cluster. Do not route telemetry cross-cluster for ingestion.
# Service config pointing to local gateway
INFRASAGE_ENDPOINT=http://infrasage-gateway.infrasage.svc.cluster.local:8080
Unified Kubeconfig for Operations
Use a merged kubeconfig for operating multiple clusters:
KUBECONFIG=~/.kube/eu-cluster:~/.kube/india-cluster kubectl config get-contexts
Cross-Cluster Runbooks
Runbooks can target pods in a specific cluster by configuring the KUBERNETES_KUBECONFIG per AIops Engine instance:
# In the EU InfraSage deployment
KUBERNETES_KUBECONFIG=/etc/kubeconfig/eu-cluster.yaml
KUBERNETES_NAMESPACE_SCOPE=infra,payments,auth
Cost Considerations
Running multiple full InfraSage stacks multiplies infrastructure costs. To reduce cost:
- Use
smallscale profile for lower-volume regions - Disable ML Engine replicas in regions where predictive capacity is not needed
- Share the Grafana instance across regions (point to multiple Prometheus remotes)
See Cost Optimization for detailed guidance.