Skip to main content

PagerDuty

InfraSage integrates bidirectionally with PagerDuty — creating incidents when anomalies are detected, routing to the correct on-call team, and syncing resolution status back.


Configuration

PAGERDUTY_API_TOKEN=your-pagerduty-api-token
PAGERDUTY_SERVICE_KEY=your-integration-key # Events API v2 key

Get your API token from PagerDuty → Integrations → API Access Keys.


Incident Lifecycle

1. InfraSage creates a PagerDuty incident

When an anomaly triggers RCA and the severity is high or critical:

{
"routing_key": "$PAGERDUTY_SERVICE_KEY",
"event_action": "trigger",
"payload": {
"summary": "CPU anomaly on payment-api: 87% CPU (score: 0.93)",
"severity": "critical",
"source": "infrasage",
"component": "payment-api",
"group": "production",
"custom_details": {
"anomaly_id": "anom-7f3d",
"root_cause": "CPU saturation from DB connection pool exhaustion",
"blast_radius": ["user-service", "checkout-service"],
"suggested_actions": ["Scale to 5 pods", "Restart pods"],
"rca_confidence": 0.92,
"infrasage_url": "https://infrasage.mycompany.com/incidents/anom-7f3d"
}
}
}

2. PagerDuty routes to on-call

PagerDuty uses your configured escalation policies to notify the on-call engineer. No additional configuration in InfraSage is required.

3. Resolution sync

When a runbook resolves an incident, InfraSage sends a PagerDuty resolve event:

{
"routing_key": "$PAGERDUTY_SERVICE_KEY",
"event_action": "resolve",
"dedup_key": "infrasage-anom-7f3d"
}

When an incident is resolved in PagerDuty, InfraSage receives the webhook and marks the anomaly as resolved in ClickHouse.


On-Call Routing

InfraSage uses PagerDuty's service mapping to route different alert types to different teams:

# Map InfraSage service IDs to PagerDuty service keys
PAGERDUTY_ROUTING_RULES='[
{"service_pattern": "payment-*", "service_key": "payments-team-key"},
{"service_pattern": "infra-*", "service_key": "platform-team-key"},
{"service_pattern": "*", "service_key": "default-key"}
]'

Severity Mapping

InfraSage Anomaly ScorePagerDuty Severity
0.4–0.6warning
0.6–0.8error
0.8–1.0critical

Bidirectional Webhook

Configure PagerDuty to send webhooks back to InfraSage:

  1. In PagerDuty: Services → Service → Extensions → Generic Webhook (v3)
  2. Webhook URL: https://your-infrasage-host:9093/api/v1/pagerduty/webhook
  3. Enable events: incident.acknowledged, incident.resolved, incident.assigned

InfraSage uses these webhooks to:

  • Update the incident status in ClickHouse
  • Record the acknowledging engineer in the audit log
  • Trigger incident memory storage when resolved with a note

Verification

# Check PagerDuty incidents created by InfraSage
curl -H "Authorization: Token token=$PAGERDUTY_API_TOKEN" \
"https://api.pagerduty.com/incidents?service_ids[]=YOUR_SERVICE_ID&statuses[]=triggered"