Skip to main content

Slack

InfraSage sends rich alert notifications to Slack and supports interactive approval flows for runbook execution.


Configuration

Incoming Webhook (Simple Alerts)

The fastest setup — no bot required:

SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
SLACK_CHANNEL=#ops-alerts

Create a webhook at api.slack.com/appsIncoming Webhooks.

Slack Bot (Interactive Approvals)

For approval flows (Approve/Reject buttons), you need a Slack App with a bot token:

SLACK_BOT_TOKEN=xoxb-your-bot-token
SLACK_SIGNING_SECRET=your-signing-secret
  1. Create an app at api.slack.com/apps
  2. Add OAuth scopes: chat:write, chat:write.public, reactions:write
  3. Enable Interactivity and set the request URL:
    https://your-infrasage-host:9093/api/v1/slack/interactions
  4. Install the app to your workspace

Alert Notifications

When an anomaly is detected, InfraSage posts a rich Block Kit message:

🚨 Anomaly Detected — payment-api
────────────────────────────────
Service: payment-api (production)
Metric: cpu_usage_percent
Score: 0.93 (Critical)
Detected at: 2026-04-10 12:00:05 UTC

Root Cause (92% confidence):
CPU saturation from DB connection pool exhaustion.
Blast radius: user-service, checkout-service

Suggested Actions:
1. Scale payment-api to 5 pods
2. Restart pods to clear memory pressure
3. Review DB connection pool configuration

Historical Match: Similar incident on March 14 — resolved in 12 min by scaling

[🔍 View in InfraSage] [📋 Create Jira Ticket] [🚑 Page On-Call]

Runbook Approval Flow

When a runbook step requires human approval, InfraSage sends an interactive message:

⚡ Runbook Approval Required
────────────────────────────────
Runbook: scale-out-payment-api
Service: payment-api
Step 2/3: Scale deployment to 5 replicas
Namespace: production

[✅ Approve] [❌ Reject]

⏱ Timeout in 10 minutes (then: halt)

When the engineer clicks Approve, InfraSage:

  1. Records the approver identity in the audit log
  2. Proceeds with the runbook step
  3. Posts a result message in the same thread

Notification Rules

Control which anomalies trigger Slack alerts:

# Only alert on high/critical anomalies
SLACK_MIN_ANOMALY_SCORE=0.6

# Alert on specific services only
SLACK_SERVICE_FILTER=payment-api,checkout-service,auth-service

# Quiet hours (no alerts, still log to ClickHouse)
SLACK_QUIET_HOURS_START=23:00
SLACK_QUIET_HOURS_END=07:00
SLACK_QUIET_HOURS_TZ=America/New_York

Incident Resolution Feedback

After resolving an incident, engineers can submit the resolution directly from Slack (if bot is configured):

/infrasage resolve anom-7f3d "Scaled from 3 to 5 pods. Memory leak in v2.3.1 DB pool — upgrading to v2.3.2."

This feeds the resolution into InfraSage's incident memory for improved future RCA.


Verification

# Test Slack webhook
curl -X POST "$SLACK_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{"text": "InfraSage Slack integration is working!"}'