Self-Observability

infrasagent exposes its own operational metrics at the admin API's /metrics endpoint in Prometheus text format. This lets you monitor the agent's health, throughput, and error rate using your existing observability stack.

Metrics

Throughput

Metric	Labels	Description
`infrasage_agent_records_received_total`	`source`, `signal`	Total records accepted by each source
`infrasage_agent_records_exported_total`	`sink`, `signal`	Total records successfully exported by each sink

Errors

Metric	Labels	Description
`infrasage_agent_export_errors_total`	`sink`	Total failed export attempts

Latency

Metric	Labels	Description
`infrasage_agent_export_duration_seconds`	`sink`	Histogram of time spent in each export call

Queue

Metric	Labels	Description
`infrasage_agent_queue_depth`	`component`	Current number of records waiting in each channel

Querying the Metrics

# Overall export rate
curl -s http://localhost:8080/metrics | grep records_exported

# Error count per sink
curl -s http://localhost:8080/metrics | grep export_errors

# Export latency percentiles (requires Prometheus scraping the agent)
histogram_quantile(0.99,
  rate(infrasage_agent_export_duration_seconds_bucket[5m])
)

Admin API Endpoints

Endpoint	Method	Description
`/health`	GET	Returns `ok` when the agent is running
`/ready`	GET	Returns `ok` when the pipeline is started and all sinks are connected
`/metrics`	GET	Prometheus metrics for the agent itself
`/topology`	GET	JSON representation of the active pipeline DAG
`/reload`	POST	Hot-reload the config file (requires API key if `api_keys` is set)

`/topology` Response

{
  "sources": ["otlp_in", "host_metrics"],
  "processors": ["k8s_enrich", "batch_main"],
  "sinks": ["infrasage"]
}

Securing the Admin API

api:
  listen: "0.0.0.0:8080"
  api_keys: ["${ADMIN_API_KEY}"]  # required for /reload; /health and /metrics are public
  read_only: true                 # disallow /reload entirely
  tls:
    cert_file: /etc/ssl/infrasagent/cert.pem
    key_file: /etc/ssl/infrasagent/key.pem

Send the key with X-API-Key or Authorization: Bearer:

curl -H "X-API-Key: ${ADMIN_API_KEY}" \
     -X POST http://localhost:8080/reload

Scraping with Prometheus

Add a scrape job to your Prometheus config:

scrape_configs:
  - job_name: infrasagent
    static_configs:
      - targets: ["localhost:8080"]
    metrics_path: /metrics
    scrape_interval: 15s

Or use the Prometheus Operator PodMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: infrasagent
  namespace: infrasage
spec:
  selector:
    matchLabels:
      app: infrasagent
  podMetricsEndpoints:
    - port: admin
      path: /metrics
      interval: 15s

Recommended Alerts

# Alert when a sink is consistently failing
- alert: InfraSageAgentExportErrors
  expr: rate(infrasage_agent_export_errors_total[5m]) > 0.1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "infrasagent export errors on sink {{ $labels.sink }}"

# Alert when export latency is high
- alert: InfraSageAgentSlowExport
  expr: |
    histogram_quantile(0.95,
      rate(infrasage_agent_export_duration_seconds_bucket[5m])
    ) > 5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "infrasagent p95 export latency > 5s on sink {{ $labels.sink }}"

# Alert when queue depth is growing
- alert: InfraSageAgentQueueBackpressure
  expr: infrasage_agent_queue_depth > 5000
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "infrasagent queue depth high on {{ $labels.component }}"

Metrics​

Throughput​

Errors​

Latency​

Queue​

Querying the Metrics​

Admin API Endpoints​

/topology Response​

Securing the Admin API​

Scraping with Prometheus​

Recommended Alerts​