Skip to main content

Deployment Options

InfraSage can be deployed three ways. All options run the same code; they differ in orchestration and scale.


Comparison

Docker ComposeKubernetesK3S (EC2)
Best forLocal dev, small teamsProduction, auto-scalingSingle-node production
Setup time5 minutes30 minutes20 minutes
High availabilityNoYesPartial
Auto-scalingNoYes (HPA)Yes (HPA)
Managed infraNoRequires clusterRequires EC2

Docker Compose

The default. All services run in Docker containers on a single host, connected by a private bridge network.

# Start
docker-compose up -d

# Stop (keep data)
docker-compose down

# Stop (delete all data)
docker-compose down -v

# Tail logs
docker-compose logs -f ingestion-gateway

# Restart a single service
docker-compose restart aiops-engine

Port mappings

Host Port → Container Port → Service
8080 → 8080 → Ingestion Gateway (HTTP)
9090 → 9090 → Ingestion Gateway (Prometheus metrics)
8081 → 8081 → Telemetry Operator (HTTP)
9091 → 9091 → Telemetry Operator (metrics)
9092 → 9092 → Redpanda (Kafka)
9093 → 9093 → AIops Engine (Alertmanager webhook)
9999 → 9090 → Prometheus UI
3000 → 3000 → Grafana
8123 → 8123 → ClickHouse (HTTP)
9000 → 9000 → ClickHouse (native)

Kubernetes

Manifests structure

deployments/kubernetes/
├── namespace.yaml
├── clickhouse/
│ ├── statefulset.yaml
│ ├── service.yaml
│ └── pvc.yaml
├── redpanda/
│ ├── statefulset.yaml
│ └── service.yaml
├── ingestion-gateway/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── hpa.yaml
├── telemetry-operator/
│ ├── deployment.yaml
│ └── service.yaml
├── aiops-engine/
│ ├── deployment.yaml
│ └── service.yaml
├── prometheus/
│ ├── configmap.yaml
│ └── deployment.yaml
└── grafana/
├── deployment.yaml
└── service.yaml

Deploy

# Create namespace and secrets
kubectl create namespace infrasage

kubectl create secret generic llm-secrets \
--from-literal=ANTHROPIC_API_KEY=sk-ant-YOUR_KEY \
--from-literal=SLACK_WEBHOOK_URL=https://hooks.slack.com/... \
-n infrasage

kubectl create secret generic clickhouse-secret \
--from-literal=password=SECURE_PASSWORD \
-n infrasage

# Deploy everything
kubectl apply -f deployments/kubernetes/

# Check status
kubectl get pods -n infrasage -w

Scaling

# Manual scale
kubectl scale deployment ingestion-gateway -n infrasage --replicas=5

# Horizontal Pod Autoscaler (already in HPA manifest)
kubectl get hpa -n infrasage
# NAME MINPODS MAXPODS REPLICAS CPU
# ingestion-gateway 2 10 3 45%

Accessing services locally (port-forward)

kubectl port-forward svc/prometheus 9999:9090 -n infrasage &
kubectl port-forward svc/grafana 3000:3000 -n infrasage &
kubectl port-forward svc/ingestion-gateway 8080:8080 -n infrasage &

K3S on AWS EC2

Launch EC2

# Recommended AMI: Amazon Linux 2023
# Instance type: t3.xlarge (dev) or m5.2xlarge (production)
# EBS: 100 GB gp3
# Security group: open 22, 80, 443, 3000, 8080-8081, 9000, 9092-9093, 9999

Install K3S

# Install K3S
curl -sfL https://get.k3s.io | sh -
sudo systemctl enable k3s --now

# Configure kubectl
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER ~/.kube/config
chmod 600 ~/.kube/config

# Verify
kubectl get nodes

Deploy InfraSage

git clone https://github.com/infrasage/infrasage.git
cd infrasage

kubectl create namespace infrasage
kubectl create secret generic llm-secrets \
--from-literal=ANTHROPIC_API_KEY=sk-ant-YOUR_KEY \
-n infrasage

kubectl apply -f deployments/kubernetes/

# Watch until all pods are running
kubectl get pods -n infrasage -w

Set up a load balancer / ingress

# K3S includes Traefik by default
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: infrasage-ingress
namespace: infrasage
spec:
rules:
- host: infrasage.mycompany.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ingestion-gateway
port:
number: 8080
EOF

Security Hardening (All Options)

Before going to production:

  • Change CLICKHOUSE_PASSWORD from the default infrasage-dev
  • Change Grafana admin password from admin
  • Rotate ANTHROPIC_API_KEY quarterly
  • Enable TLS on ClickHouse connections
  • Set up Kubernetes NetworkPolicies to restrict inter-service communication
  • Use a secrets manager (AWS Secrets Manager, HashiCorp Vault) instead of .env files
  • Enable ClickHouse access logging
  • Set up regular ClickHouse backups

See the Security section for full guidance.