Telemetry Types
InfraSage supports six telemetry types. Each type determines what fields are required and how the data is stored and analyzed.
Type Reference
| Type | value required | body required | trace_id supported | Typical Use |
|---|---|---|---|---|
metric | Yes | No | No | CPU %, latency, throughput, error rate |
log | No | Yes | No | Application log lines |
trace | Yes | No | Yes | Distributed tracing, span durations |
event | No | Yes | No | Kubernetes events, deployments, alerts |
profile | Yes | Yes | No | CPU/memory profiling snapshots |
slo | Yes | No | No | SLI measurements for SLO tracking |
If type is omitted, it defaults to metric.
metric
Numeric time-series data. The most common type. Used for infrastructure metrics, application KPIs, and custom business metrics.
{
"service_id": "api-gateway",
"type": "metric",
"metric_name": "request_latency_p99_ms",
"value": 342.1,
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"region": "us-east-1",
"method": "POST",
"path": "/api/v1/checkout"
}
}
Anomaly detection is active for metric records. The Watchdog monitors each (service_id, metric_name) pair independently, maintaining a sliding window of historical values to compute Z-scores.
log
Textual log entries. Stored in body. No numeric value required.
{
"service_id": "auth-service",
"type": "log",
"body": "Login failed: invalid credentials for user u-12345",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"level": "warn",
"user_id": "u-12345",
"ip": "203.0.113.42"
}
}
Logs are stored in ClickHouse and can be queried via the Admin UI or directly through ClickHouse SQL. They are also correlated with metrics and traces during RCA.
trace
Distributed trace spans. Requires a trace_id to link spans across services.
{
"service_id": "checkout-service",
"type": "trace",
"metric_name": "checkout.handle_payment",
"value": 0.342,
"trace_id": "abc123def4567890",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"span_id": "span-001",
"parent_span_id": "span-root",
"status": "ok"
}
}
Trace data is stored in infrasage_exemplars (high-cardinality store) and linked back to metric anomalies for root cause analysis.
event
Discrete occurrences without a continuous numeric value. Used for deployments, Kubernetes events, feature flag changes.
{
"service_id": "k8s-cluster",
"type": "event",
"body": "Pod checkout-api-7f9d4b crashed with OOMKilled",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"kind": "Pod",
"namespace": "production",
"reason": "OOMKilled",
"count": "3"
}
}
Events are correlated with anomalies during RCA — if a pod crash event precedes a latency spike, Claude surfaces this in its root-cause explanation.
profile
Profiling snapshots with both a numeric summary and raw body data.
{
"service_id": "payment-service",
"type": "profile",
"metric_name": "cpu_flame_graph_sample_count",
"value": 4200,
"body": "... base64-encoded pprof data ...",
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"profile_type": "cpu",
"duration_ms": "5000"
}
}
slo
Service Level Objective measurements. Track SLI compliance over time.
{
"service_id": "api-gateway",
"type": "slo",
"metric_name": "availability_percent",
"value": 99.97,
"timestamp": "2026-04-10T12:00:00Z",
"attributes": {
"slo_name": "API Availability",
"target": "99.9",
"window": "30d"
}
}
SLO records are stored in infrasage_slo and tracked separately from general metrics to avoid polluting anomaly detection baselines with intentional threshold measurements.
Custom Attributes
All telemetry types support arbitrary key-value attributes. These are stored as a JSON column in ClickHouse and indexed for fast filtering.
Best practices:
- Use dot-separated namespaces:
aws.region,k8s.namespace,app.version - Keep cardinality reasonable — avoid user IDs or request IDs in attribute keys
- Attributes are included in RCA context sent to Claude
Listing Supported Types via API
curl http://localhost:8080/api/v1/telemetry-types
{
"types": ["metric", "log", "trace", "event", "profile", "slo"],
"default": "metric"
}