Self-Observability
infrasagent exposes its own operational metrics at the admin API's /metrics endpoint in Prometheus text format. This lets you monitor the agent's health, throughput, and error rate using your existing observability stack.
Metrics
Throughput
| Metric | Labels | Description |
|---|---|---|
infrasage_agent_records_received_total | source, signal | Total records accepted by each source |
infrasage_agent_records_exported_total | sink, signal | Total records successfully exported by each sink |
Errors
| Metric | Labels | Description |
|---|---|---|
infrasage_agent_export_errors_total | sink | Total failed export attempts |
Latency
| Metric | Labels | Description |
|---|---|---|
infrasage_agent_export_duration_seconds | sink | Histogram of time spent in each export call |
Queue
| Metric | Labels | Description |
|---|---|---|
infrasage_agent_queue_depth | component | Current number of records waiting in each channel |
Querying the Metrics
# Overall export rate
curl -s http://localhost:8080/metrics | grep records_exported
# Error count per sink
curl -s http://localhost:8080/metrics | grep export_errors
# Export latency percentiles (requires Prometheus scraping the agent)
histogram_quantile(0.99,
rate(infrasage_agent_export_duration_seconds_bucket[5m])
)
Admin API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Returns ok when the agent is running |
/ready | GET | Returns ok when the pipeline is started and all sinks are connected |
/metrics | GET | Prometheus metrics for the agent itself |
/topology | GET | JSON representation of the active pipeline DAG |
/reload | POST | Hot-reload the config file (requires API key if api_keys is set) |
/topology Response
{
"sources": ["otlp_in", "host_metrics"],
"processors": ["k8s_enrich", "batch_main"],
"sinks": ["infrasage"]
}
Securing the Admin API
api:
listen: "0.0.0.0:8080"
api_keys: ["${ADMIN_API_KEY}"] # required for /reload; /health and /metrics are public
read_only: true # disallow /reload entirely
tls:
cert_file: /etc/ssl/infrasagent/cert.pem
key_file: /etc/ssl/infrasagent/key.pem
Send the key with X-API-Key or Authorization: Bearer:
curl -H "X-API-Key: ${ADMIN_API_KEY}" \
-X POST http://localhost:8080/reload
Scraping with Prometheus
Add a scrape job to your Prometheus config:
scrape_configs:
- job_name: infrasagent
static_configs:
- targets: ["localhost:8080"]
metrics_path: /metrics
scrape_interval: 15s
Or use the Prometheus Operator PodMonitor:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: infrasagent
namespace: infrasage
spec:
selector:
matchLabels:
app: infrasagent
podMetricsEndpoints:
- port: admin
path: /metrics
interval: 15s
Recommended Alerts
# Alert when a sink is consistently failing
- alert: InfraSageAgentExportErrors
expr: rate(infrasage_agent_export_errors_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "infrasagent export errors on sink {{ $labels.sink }}"
# Alert when export latency is high
- alert: InfraSageAgentSlowExport
expr: |
histogram_quantile(0.95,
rate(infrasage_agent_export_duration_seconds_bucket[5m])
) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "infrasagent p95 export latency > 5s on sink {{ $labels.sink }}"
# Alert when queue depth is growing
- alert: InfraSageAgentQueueBackpressure
expr: infrasage_agent_queue_depth > 5000
for: 2m
labels:
severity: warning
annotations:
summary: "infrasagent queue depth high on {{ $labels.component }}"