Slack
InfraSage sends rich alert notifications to Slack and supports interactive approval flows for runbook execution.
Configuration
Incoming Webhook (Simple Alerts)
The fastest setup — no bot required:
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
SLACK_CHANNEL=#ops-alerts
Create a webhook at api.slack.com/apps → Incoming Webhooks.
Slack Bot (Interactive Approvals)
For approval flows (Approve/Reject buttons), you need a Slack App with a bot token:
SLACK_BOT_TOKEN=xoxb-your-bot-token
SLACK_SIGNING_SECRET=your-signing-secret
- Create an app at api.slack.com/apps
- Add OAuth scopes:
chat:write,chat:write.public,reactions:write - Enable Interactivity and set the request URL:
https://your-infrasage-host:9093/api/v1/slack/interactions
- Install the app to your workspace
Alert Notifications
When an anomaly is detected, InfraSage posts a rich Block Kit message:
🚨 Anomaly Detected — payment-api
────────────────────────────────
Service: payment-api (production)
Metric: cpu_usage_percent
Score: 0.93 (Critical)
Detected at: 2026-04-10 12:00:05 UTC
Root Cause (92% confidence):
CPU saturation from DB connection pool exhaustion.
Blast radius: user-service, checkout-service
Suggested Actions:
1. Scale payment-api to 5 pods
2. Restart pods to clear memory pressure
3. Review DB connection pool configuration
Historical Match: Similar incident on March 14 — resolved in 12 min by scaling
[🔍 View in InfraSage] [📋 Create Jira Ticket] [🚑 Page On-Call]
Runbook Approval Flow
When a runbook step requires human approval, InfraSage sends an interactive message:
⚡ Runbook Approval Required
────────────────────────────────
Runbook: scale-out-payment-api
Service: payment-api
Step 2/3: Scale deployment to 5 replicas
Namespace: production
[✅ Approve] [❌ Reject]
⏱ Timeout in 10 minutes (then: halt)
When the engineer clicks Approve, InfraSage:
- Records the approver identity in the audit log
- Proceeds with the runbook step
- Posts a result message in the same thread
Notification Rules
Control which anomalies trigger Slack alerts:
# Only alert on high/critical anomalies
SLACK_MIN_ANOMALY_SCORE=0.6
# Alert on specific services only
SLACK_SERVICE_FILTER=payment-api,checkout-service,auth-service
# Quiet hours (no alerts, still log to ClickHouse)
SLACK_QUIET_HOURS_START=23:00
SLACK_QUIET_HOURS_END=07:00
SLACK_QUIET_HOURS_TZ=America/New_York
Incident Resolution Feedback
After resolving an incident, engineers can submit the resolution directly from Slack (if bot is configured):
/infrasage resolve anom-7f3d "Scaled from 3 to 5 pods. Memory leak in v2.3.1 DB pool — upgrading to v2.3.2."
This feeds the resolution into InfraSage's incident memory for improved future RCA.
Verification
# Test Slack webhook
curl -X POST "$SLACK_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{"text": "InfraSage Slack integration is working!"}'