Skip to main content

Telemetry Types

InfraSage supports six telemetry types. Each type determines what fields are required and how the data is stored and analyzed.


Type Reference

Typevalue requiredbody requiredtrace_id supportedTypical Use
metricYesNoNoCPU %, latency, throughput, error rate
logNoYesNoApplication log lines
traceYesNoYesDistributed tracing, span durations
eventNoYesNoKubernetes events, deployments, alerts
profileYesYesNoCPU/memory profiling snapshots
sloYesNoNoSLI measurements for SLO tracking

If type is omitted, it defaults to metric.


metric

Numeric time-series data. The most common type. Used for infrastructure metrics, application KPIs, and custom business metrics.

{
"service_id": "api-gateway",
"type": "metric",
"metric_name": "request_latency_p99_ms",
"value": 342.1,
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"region": "us-east-1",
"method": "POST",
"path": "/api/v1/checkout"
}
}

Anomaly detection is active for metric records. The Watchdog monitors each (service_id, metric_name) pair independently, maintaining a sliding window of historical values to compute Z-scores.


log

Textual log entries. Stored in body. No numeric value required.

{
"service_id": "auth-service",
"type": "log",
"body": "Login failed: invalid credentials for user u-12345",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"level": "warn",
"user_id": "u-12345",
"ip": "203.0.113.42"
}
}

Logs are stored in ClickHouse and can be queried via the Admin UI or directly through ClickHouse SQL. They are also correlated with metrics and traces during RCA.


trace

Distributed trace spans. Requires a trace_id to link spans across services.

{
"service_id": "checkout-service",
"type": "trace",
"metric_name": "checkout.handle_payment",
"value": 0.342,
"trace_id": "abc123def4567890",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"span_id": "span-001",
"parent_span_id": "span-root",
"status": "ok"
}
}

Trace data is stored in infrasage_exemplars (high-cardinality store) and linked back to metric anomalies for root cause analysis.


event

Discrete occurrences without a continuous numeric value. Used for deployments, Kubernetes events, feature flag changes.

{
"service_id": "k8s-cluster",
"type": "event",
"body": "Pod checkout-api-7f9d4b crashed with OOMKilled",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"kind": "Pod",
"namespace": "production",
"reason": "OOMKilled",
"count": "3"
}
}

Events are correlated with anomalies during RCA — if a pod crash event precedes a latency spike, Claude surfaces this in its root-cause explanation.


profile

Profiling snapshots with both a numeric summary and raw body data.

{
"service_id": "payment-service",
"type": "profile",
"metric_name": "cpu_flame_graph_sample_count",
"value": 4200,
"body": "... base64-encoded pprof data ...",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"profile_type": "cpu",
"duration_ms": "5000"
}
}

slo

Service Level Objective measurements. Track SLI compliance over time.

{
"service_id": "api-gateway",
"type": "slo",
"metric_name": "availability_percent",
"value": 99.97,
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"slo_name": "API Availability",
"target": "99.9",
"window": "30d"
}
}

SLO records are stored in infrasage_slo and tracked separately from general metrics to avoid polluting anomaly detection baselines with intentional threshold measurements.


Custom Attributes

All telemetry types support arbitrary key-value attributes. These are stored as a JSON column in ClickHouse and indexed for fast filtering.

Best practices:

  • Use dot-separated namespaces: aws.region, k8s.namespace, app.version
  • Keep cardinality reasonable — avoid user IDs or request IDs in attribute keys
  • Attributes are included in RCA context sent to Claude

Listing Supported Types via API

curl http://localhost:8080/api/v1/telemetry-types
{
"types": ["metric", "log", "trace", "event", "profile", "slo"],
"default": "metric"
}