Processors

Processors sit between sources and sinks. They transform, enrich, filter, batch, and sample telemetry. Multiple processors can be chained — each one declares its upstream sources.

`batch` — Batching

Buffers records and flushes downstream on a time or size threshold. Always place a batch processor before sinks to reduce the number of export requests.

processors:
  batch_main:
    type: batch
    timeout: 5s
    max_size: 10000
    sources: [otlp_in, host_metrics]

Option	Default	Description
`timeout`	`5s`	Flush if this much time passes since last flush
`max_size`	`10000`	Flush when the batch reaches this many records
`sources`	required	Upstream source or processor names

`attributes` — Attribute Mutations

Add, remove, rename, update, or hash attributes on log records, metric data points, and trace spans.

processors:
  add_env:
    type: attributes
    sources: [otlp_in]
    actions:
      - action: insert
        key: deployment.environment
        value: "production"

      - action: upsert
        key: host.name
        value: "${HOSTNAME}"

      - action: delete
        key: http.request.header.authorization

      - action: hash
        key: user.email

      - action: rename
        key: old_key
        new_key: new_key

Action	Description
`insert`	Add the key only if it does not exist
`upsert`	Add or overwrite the key
`delete`	Remove the key
`hash`	Replace the value with its SHA-256 hex digest
`rename`	Rename a key while preserving its value

`filter` — Record Filtering

Drop or keep records based on attribute conditions. Unmatched records are dropped.

processors:
  drop_debug:
    type: filter
    sources: [add_env]
    logs:
      severity_number:
        min: 9       # keep INFO (9) and above, drop DEBUG (1-8)

  production_only:
    type: filter
    sources: [otlp_in]
    logs:
      attributes:
        deployment.environment: "production"

Log filter options:

Option	Description
`severity_number.min`	Minimum OTLP severity number (1=TRACE, 9=INFO, 13=WARN, 17=ERROR)
`severity_number.max`	Maximum OTLP severity number
`attributes`	Key-value map; all conditions must match
`body_regex`	Keep records whose body matches this regex

Metric filter options:

Option	Description
`metric_names`	List of metric name regexes to keep
`attributes`	Key-value attribute conditions

`transform` — CEL Expressions

Mutate fields using Common Expression Language (CEL) expressions. Supports an optional where clause to apply mutations selectively.

processors:
  normalize:
    type: transform
    sources: [otlp_in]
    log_statements:
      - context: log
        where: 'severity_number < 9'
        statements:
          - 'set(severity_text, "DEBUG")'

      - context: log
        statements:
          - 'set(attributes["http.url"], redact_url(attributes["http.url"]))'

    metric_statements:
      - context: datapoint
        where: 'metric.name == "http.server.duration"'
        statements:
          - 'set(value_double, value_double / 1000)'  # ms → seconds

Statements are evaluated in order. The where clause skips the statement block when false.

`sampling` — Trace Sampling

Reduces trace volume before export. Supports head-based (per-span decision) and tail-based (decision after the full trace is complete) policies.

processors:
  trace_sample:
    type: sampling
    sources: [otlp_in]

    # Tail-based: buffer complete traces before deciding
    decision_wait: 10s      # how long to wait for all spans
    max_traces: 100000      # max traces in buffer before eviction

    policies:
      - name: always_errors
        type: error           # keep any trace with an error span

      - name: slow_requests
        type: latency
        threshold_ms: 500     # keep traces slower than 500ms

      - name: baseline
        type: probabilistic
        sampling_percentage: 5  # keep 5% of everything else

Policy type	Description
`always_sample`	Keep all traces (useful for testing)
`error`	Keep traces where any span has status = Error
`latency`	Keep traces with root-span duration > `threshold_ms`
`probabilistic`	Keep `sampling_percentage`% based on trace ID hash

Policies are evaluated in order. The first matching policy wins.

`k8s_attributes` — Kubernetes Metadata

Injects Kubernetes pod, node, and namespace metadata into logs, metrics, and traces. Reads from the Kubernetes API using the agent's service account.

processors:
  k8s_enrich:
    type: k8s_attributes
    sources: [otlp_in, app_logs]
    extract:
      - k8s.pod.name
      - k8s.namespace.name
      - k8s.node.name
      - k8s.deployment.name
      - k8s.container.name

Option	Default	Description
`extract`	all listed	Which attributes to inject
`kubeconfig`	in-cluster	Path to kubeconfig (leave empty when running in-cluster)

The processor uses the pod IP of the record's originating process to look up pod metadata from the Kubernetes API. Required RBAC: get/list/watch on pods and namespaces.

`aggregate` — Metric Pre-Aggregation

Reduces metric cardinality by aggregating high-cardinality label sets before export. Useful for keeping cost down when labels like user_id or request_id are attached.

processors:
  agg:
    type: aggregate
    sources: [otlp_in]
    metrics:
      - name: "http.server.request.count"
        drop_attributes: [user_id, request_id]
        aggregation: sum
        interval: 60s

Option	Description
`name`	Metric name (regex supported)
`drop_attributes`	Attributes to remove before aggregating
`aggregation`	`sum`, `min`, `max`, `last`
`interval`	Aggregation window

`deduplicate` — Deduplication

Drops duplicate records within a rolling time window using a content fingerprint. The fingerprint covers the body (for logs), metric name + labels (for metrics), or span name + trace ID (for traces).

processors:
  dedup:
    type: deduplicate
    sources: [syslog_in]
    window: 30s
    max_entries: 100000

Option	Default	Description
`window`	`30s`	How far back to remember fingerprints
`max_entries`	`100000`	Max fingerprints to retain (LRU eviction)

`rate_limit` — Rate Limiting

Applies a token-bucket rate limit per label set. Records that exceed the limit are dropped.

processors:
  limit_logs:
    type: rate_limit
    sources: [syslog_in]
    rate: 1000      # records per second
    burst: 5000     # burst capacity
    group_by: [service.name]  # separate bucket per service

Option	Default	Description
`rate`	required	Sustained records per second
`burst`	`rate * 5`	Maximum burst capacity
`group_by`	`[]`	Attributes to partition buckets by

`geoip` — Geo-IP Enrichment

Looks up IP addresses in a MaxMind GeoLite2 or GeoIP2 database and injects location attributes.

processors:
  geo:
    type: geoip
    sources: [otlp_in]
    database: "/etc/infrasagent/GeoLite2-City.mmdb"
    ip_attribute: "client.address"   # attribute containing the IP
    target_prefix: "client.geo"      # prefix for injected attributes

Injected attributes: client.geo.country_iso_code, client.geo.city_name, client.geo.latitude, client.geo.longitude.

Option	Default	Description
`database`	required	Path to MaxMind `.mmdb` file
`ip_attribute`	`client.address`	Attribute name holding the IP
`target_prefix`	`geo`	Prefix for injected attributes

batch — Batching​

attributes — Attribute Mutations​

filter — Record Filtering​

transform — CEL Expressions​

sampling — Trace Sampling​

k8s_attributes — Kubernetes Metadata​

aggregate — Metric Pre-Aggregation​

deduplicate — Deduplication​

rate_limit — Rate Limiting​

geoip — Geo-IP Enrichment​