Brainstorm: OTel Network Monitoring — eBPF Event Stream
Problem & Topic
Design center: SIEM-style visibility for clawker agents. Today's monitoring stack is built like a dev-debug observability stack (Grafana + Loki + Prom + Jaeger). Users running clawker actually need a SIEM-grade surface to answer "which of my 5 running agents is compromised?" and "why did this agent burn 10M tokens overnight?" — security + cost forensics over all agent-side telemetry, not just network logs.
Two architectural problems with the existing pipeline:
- Attribution is in the wrong layer. Agent identity must be resolved at the source-event layer, but logs come from Envoy/CoreDNS — neither has agent-identity scope. Promtail then tries to bolt agent labels on via relabel rules with no access to Docker labels. The
agent:andproject:labels inpromtail-config.yaml.tmplare vestigial from this attempt — always blank. - Backing store is wrong for the workload. Cross-source security drill-down (filter network events by agent+verdict AND join with overseer-event firewall changes AND Claude-Code tool-call events, all on shared
agent/projectdimensions) is search-engine work, not time-series-log work. Two prior attempts at making Loki serve this UX failed.
Scope: two coupled workstreams.
- New BPF egress event stream — this is one new source feeding the SIEM surface. Schema, BPF mechanics, unification with Envoy/CoreDNS.
- Monitor stack replacement — entire stack pivots to OpenTelemetry-universal-ingest + OpenSearch. Receives all existing + new sources (see Ingest Sources Inventory section). New egress stream is the forcing function; the pivot benefits every other source.
Real requirement
- All clawker-side telemetry queryable from one UI, with shared global filters. Not "the network panel" + "the metrics panel" + "the logs panel" living in separate stacks.
- Per-agent attribution at the source for every event, every source — not joined post-hoc. Same string values everywhere (D17 generalises beyond egress).
- Time-series stream of events, filterable in the dashboard UI. Humans filter freely: failures only, single container, single proto, single domain, custom combinations.
- Network-source specific: cover all L4 protocols — TCP, UDP, SSH, FTP, QUIC, raw sockets. Include bypass-mode traffic (today: forensic black hole).
- Dual filter surface (applies to every panel, not just network):
- Dashboard global filters (top-of-dashboard variable selectors —
$project,$agent, etc.) narrow every panel across every source. - Per-panel filters work on top of the global selection.
- Dashboard global filters (top-of-dashboard variable selectors —
- Get data in — user owns the UI. Backing store responsibility: receive, store, make queryable. Cost dashboards / panel layouts / saved searches are user-configurable, not part of this design.
- Pure visibility. No baseline learning, no alerts, no classifier. No metric counters in OS — metrics stay in Prom per D23; events go to OS.
- No prompt content stored. Low-level event signal only. Claude-Code prompt-text capture (
OTEL_LOG_USER_PROMPTS) stays off — GB of natural-language text adds no forensic value.
Ingest Sources Inventory
The OS layer must accept these sources today plus accommodate the planned next one. Source-shape diversity is the design constraint that rules out a single polymorphic mega-index.
| Source | Status | Signal type | Examples | SIEM value |
|---|---|---|---|---|
CP app logs (clawker-cp) |
Exists (file) | Structured log records (zerolog) | event=agent_dialer_unavailable, dialer errors, registry mismatches |
CP misbehaving ≡ security concern. Operator must see degraded paths. |
| CP overseer events | Exists (in-process bus) | Domain events from internal/controlplane/overseer/ |
Firewall rule changes, bypass activation, agent lifecycle, registry events, session connected/disconnected, trust-attestation outcomes | Audit trail. "Who toggled bypass at 3am?" Today these are in-process only. |
| Claude Code telemetry | Exists (already OTLP-emitting) | Logs/events (records) + metrics (separate path) | API requests, API errors, tool decisions, tool results, cost & token metrics | "What did the agent decide to do?" Per-tool-call audit. Cost data alongside security data for correlated drill-down. Metrics → Prom (D23). Logs/events → OS (new). |
| BPF egress events | New (this brainstorm) | L3/L4 verdicts from cgroup hooks | connect4/sendmsg4/recvmsg4 verdicts: ALLOWED/DENIED/BYPASSED, dst_ip/dst_port/l4_proto, domain-hash → domain via P7 reverse map | Every outbound network attempt visible — including bypass-mode (forensic black hole today). |
| Envoy access logs | New (this brainstorm) | L7-over-MITM HTTP/TCP access records | HTTP method/path/response_code, TLS SNI, response_flags, upstream timing — emitted natively as OTLP via envoy.access_loggers.open_telemetry (D31) |
"What HTTP API did the agent actually hit?" Path-level forensics for TLS-terminated traffic. |
| CoreDNS query logs | New (this brainstorm) | Per-query DNS records | client_ip, qname, qtype, rcode, answer set — emitted via log plugin stdout → collector filelog receiver (D32) |
"What did the agent try to resolve?" DNS-side audit independent of BPF/Envoy. |
| Sys exec call events | Planned (future scope) | eBPF-derived process events | execve / fork / exit, command line + cgroup attribution | "What did the agent actually run inside the container?" Pairs with egress for full agent-behavior picture. NOT a current source-list entry — design only. |
Cross-source invariants:
- Every doc carries
@timestamp,agent,projectas a common attribution skeleton. Value-alignment per D17 generalises — same string values across all sources. - Every source emits via OTLP to the central
otel-collector(D22). No direct-to-OS writes. - Metrics stay in Prom (D23). Logs and events go to OS. No metric-as-document anti-pattern.
- Each source = its own data stream under
clawker-*prefix; OSD patternclawker-*queries across them.
Current BPF Surface
Seven cgroup programs in bpf/clawker.c: connect4, sendmsg4, recvmsg4, connect6, sendmsg6, recvmsg6, sock_create. Every decision point already invokes metric_inc() — same call sites become ringbuf emission points.
Existing pinned maps
| Map | Key | Value | Used by event-stream change? |
|---|---|---|---|
container_map | cgroup_id | container_config | presence gate for enforcement (no change) |
bypass_map | cgroup_id | u8 flag | no change; bypass still counted as ACTION_BYPASS |
dns_cache | IP | {domain_hash, expire_ts} | reader walks this to build hash→domain reverse map |
route_map | {domain_hash, dst_port} | {envoy_port} | no change |
metrics_map | {cgroup_id, hash, port, action} | counter | stays for break-glass ebpf-manager dump |
events_ringbuf | (none) | egress_event records | NEW — this change adds it |
Confirmed Decisions
Only items the user has explicitly approved. Order is logical, not chronological.
- D1 — Primary signal BPF event-stream replaces the Envoy/Promtail-only pipeline. Visibility/forensic system, not metrics. Destination is OpenSearch per D20.
- D2 — Attribution at source Agent identity resolved at the source-event layer (BPF emits cheap kernel-scope key; CP enriches with Docker labels). Not post-hoc relabel joins.
- D3 — Verdict enum
ALLOWED | DENIED | BYPASSED.BYPASSEDadded (bypass-mode forensics is the headline win);ERRORdropped (upstream failures are data carried in response fields, not verdict). - D4 — Single polymorphic event One schema covers every record. No multiple
event_kinds with separate shapes. Hubble'sFlowproto pattern as reference shape. - D5 — L4 + L7 as fields, not discriminators L4 transport (
tcp/udp) and L7 application protocol (https,dns,ssh, ...) are fields on the polymorphic event. Never used as event-kind discriminators. L7 protocol is free-form string (open set), not closed enum. - D6 — L7-specific fields per protocol Each known L7 protocol has its own planned field set (HTTP-family: method/path/response_code; DNS: qname/qtype/rcode; opaque: absent). Detailed mapping is a proposal (see P4).
- D7 — One entry per logical roundtrip No lifecycle events. No connect-attempt-only records. One event = one logical network interaction.
- D8 — Security logs only No debug noise, no diagnostic records, no detection/baselining/alerting layer. Pure visibility for forensic investigation.
- D9 — Comprehensive port/protocol coverage BPF cgroup hooks catch every outbound network request — any port, any L4 protocol, including L7s clawker doesn't recognize. Nothing escapes the event stream.
- D10 — Cilium/Hubble pattern as reference Production-proven L4+L7 unification: BPF for L3/L4 + Envoy-ALS-style reader for L7-over-TLS + central enriching reader. Don't invent. BPF-only L7-over-TLS visibility would require Pixie-scale uprobe infrastructure (per-binary DWARF, libssl/BoringSSL/GoTLS/Python-ssl/… symbol hooks). Out of scope.
- D11 — Keep Envoy MITM Envoy stays for enforcement (path_rules) and L7 forensic data over TLS. Not the primary observability source — one of three inputs (BPF, Envoy, CoreDNS) that feed the unified stream.
- D12 — Drop the “fix Envoy/Promtail attribution” path Architectural mismatch — Promtail can't see Docker labels. Sunk cost. The Envoy logs themselves are fine; the ingestion path was wrong.
- D13 — Response bodies NOT captured Only response_code + bytes. Body capture is a different feature (privacy + perf cost over TLS-terminated content).
- D14 — Dual filter surface Two filter dimensions, both required:
- Low-cardinality global filters drive dashboard variables (
$agent,$project, ...) and narrow every panel. - High-cardinality refinement filters (dst_host, dst_ip, qname, dst_port, response_code, ...) work per-panel on top of the global selection.
- Low-cardinality global filters drive dashboard variables (
- D15 — Docs followup
docs/firewall.mdxarchitecture diagram is chronologically imprecise (BPF chronology elided). Update alongside this feature. Not blocking; separate PR. - D17 — Cross-system value-alignment contract When a source emits
agentandproject, the string values MUST equal the values Claude Code's existing Prometheus telemetry uses for the same names, sourced from the agent container'sdev.clawker.agentanddev.clawker.projectDocker labels (the existing authoritative source per CLAUDE.md “labels (dev.clawker.*) authoritative for filtering”). D14's dual-filter requirement says picking$agent/$projectin the dashboard global filters must narrow every panel. Variables source from Prom (existing); panels query OS. If the two sources disagree on the value strings, the dashboard dropdown selects a value that no event document carries, and panels go blank. No synthesis: never use cgroup name, never useAgentFullName'sclawker.<project>.<agent>form, never anything derived. Sources that have access to the labels (CP-side readers, Claude Code's launch env) emit the raw Docker-label values directly. - D20 — Replace entire monitor stack with OpenSearch + OpenSearch Dashboards
Loki cannot serve the security-event drill-down UX (path-level filter, response_code filter, agent+project drill-down, dst_host across one or all agents). Two prior attempts failed. OpenSearch's index/search model is built for this workload — high-cardinality fields are first-class, drill-down across multiple structured fields is the design center. Replacing only the new-stream path while leaving Grafana+Loki for existing data fragments observability across two UIs; replacing the entire backend is the only way to consolidate around a query model that actually serves the needs.
Stack delta (target endgame — 6 containers → 4):
- Drop:
loki,grafana,promtail,jaeger - Keep:
otel-collector(universal ingest; D22),prometheus(metrics store; D23) - Add:
opensearch,opensearch-dashboards
internal/monitor/templates/grafana-dashboard.json(~30 panels) get ported to OSD — mechanical work, real volume. - Drop:
- D21 — Drop Jaeger Nothing currently emits traces; nothing currently consumes traces. The container is in the stack today but unused. Future tracing needs (if they emerge) can be served by OpenSearch's built-in Trace Analytics — no separate Jaeger container required.
- D22 — OTel as the universal ingest path Every source of telemetry — Claude Code (already does), clawker-cp, CP-side egress readers, Envoy access logs, CoreDNS access logs — emits OTLP to
otel-collector. The collector exports to OpenSearch. No more Promtail tail-and-relabel. No more Prom scrape (except Prom's own metrics pipeline per D23). Source-side attribution at emission (D2) is preserved through OTLP resource attributes / record attributes. A single, source-attributed ingest path with one destination is the architectural fix the prior pipeline couldn't deliver. Attribution at emission was always the right answer; previously it didn't survive the relabel pipeline. OTLP carries structured attributes natively, end-to-end. - D23 — Keep Prometheus for metrics (unconditional) Prometheus stays in the stack. Metrics path (Claude Code → OTLP → otel-collector → Prom) is unchanged from today.
Prom is purpose-built for time-series metrics. For the volume and shape of Claude Code's emitted metrics (counters, gauges, histograms over agent / project / session_id dimensions, plus rate / sum aggregations), it is materially more performant than treating metric datapoints as documents in OpenSearch. Events and logs go to OS where the search-engine model is right; metrics stay in Prom where the TSDB model is right.
Hard constraint: this is the ONLY component of the old monitor stack that survives the D20 sweep. Loki, Grafana, Promtail, Jaeger do not get resurrected under any circumstance. If a gap surfaces during implementation, the fix lives in the OS or collector layer, never by re-adding stack components.
- Source → Prom routing. Anything metrics-shaped goes to Prom, not OS. Today this is mostly Claude Code's emitted metrics, routed via
OTEL_*env vars set at container launch. - otel-collector must include Prom exporter wiring. Existing config already does this — preserve through the D20 rewrite (don't accidentally drop the Prom export branch when stripping Loki/Jaeger).
- OSD must include Prometheus datasource plugin. Otherwise metric panels can't query Prom from the new UI. Open implementation question: OSD's Prometheus datasource plugin maturity for the existing PromQL panel shapes (rate / sum / counter increments / unwrap) — verify during dashboard port; if it falls short, the answer is “adapt the panels” not “re-add Grafana.”
- Source → Prom routing. Anything metrics-shaped goes to Prom, not OS. Today this is mostly Claude Code's emitted metrics, routed via
- D24 — SIEM design center, not SRE/dev-debug The new stack serves operators investigating what their agents are doing — security + cost forensics — not application performance debugging. Every design call (index strategy, mapping mode, field naming, retention) is judged against the SIEM use case, not against “is this nice for SRE dashboards.” D14 (dual filter surface), D17 (cross-system value-alignment), and source coverage all flow from this. Treating it like an SRE observability backend produces narrow, source-specific design (e.g. an “egress index”). Treating it like a SIEM forces cross-source thinking: every doc carries shared attribution; index family naming permits multi-source queries; storage shape favors filter/drill-down over time-series rollup. Whenever a sub-decision is being framed, sanity-check: would a security operator investigating “is agent X compromised / why did agent Y burn $50 of tokens overnight” care? If not, the framing is too narrow.
- D25 — OS is the universal events/logs destination for ALL clawker telemetry Every source listed in Ingest Sources Inventory — CP app logs, CP overseer events, Claude Code logs/events, network egress, future sys exec events — lands in OS as queryable documents. The OS-layer design must accommodate the inventory; the new BPF egress stream is one source among several. D22 already locked OTel as universal ingest. D25 closes the loop on the destination side: the same destination receives all OTel-shipped sources. Anything less reverts to the fragmented SRE-stack problem the pivot exists to fix. Any OS-side proposal (index strategy, mapping mode, schema, retention) must be evaluated against the full source inventory, not just the egress stream. A proposal that only works for egress and breaks Claude Code log ingestion is wrong by construction.
- D26 — Per-source data streams under
clawker-*prefix Each source = its own data stream (clawker-egress,clawker-claude,clawker-cp,clawker-overseer, futureclawker-exec). Not one polymorphic mega-index. Source shapes differ enough that a single polymorphic index would either (a) carry hundreds of mostly-absent fields per doc with the mapping-explosion risk that warns against, or (b) collapse to a lossy lowest-common-denominator schema. Per-source data streams keep each schema clean. OSD's index patternclawker-*cross-queries them for SIEM workflows; OSD selector variables resolve$agent/$projectuniformly thanks to the common attribution skeleton (D27). Index template + ISM rollover policy defined per source. Naming follows OS data-stream convention: backing indices auto-named.ds-clawker-<source>-<date>-<gen>. New sources added by registering new data streams, not by mutating existing ones. - D27 — At-source attribution rules (no collector enrichment) Three simple rules govern what attribution fields a source emits:
- If the source has
agent+projectin scope, it MUST include them in the JSON record. No matter what. - If the source's event is about (or scoped to) a specific container, it MUST include
container_idin the JSON record. - If the source has neither, it emits with whatever context it has and that's fine. Not every doc needs every field.
log.With("project", ...).With("agent", ...)in scope-aware places; missing call sites are fixed where they exist, not papered over downstream. BPF reader uses P13 cache to attach agent/project at OTLP-emit time (this is at-source, not collector-side, even though the cache is in CP-the-process — CP is the BPF reader). Envoy and CoreDNS readers are CP-side too, same rule. Sources without container context (CP startup, CP shutdown, dialer-init) just omit; not a bug. - If the source has
- D28 — No prompt content stored Claude Code's
OTEL_LOG_USER_PROMPTSstays off (default). User-prompt text and assistant-response text are NOT captured in OS. Tool decisions, API request/error metadata, token counts, model identifiers ARE captured. Prompt content = GB of natural-language text with near-zero forensic value. Real signal is at the structural layer (which tool got called, with what cost, against what API endpoint, with what verdict). Storing prompts also raises privacy / data-handling complications that visibility doesn't need. Two-layer enforcement: (a) upstream — Claude Code launched withOTEL_LOG_USER_PROMPTS=0; (b) belt-and-suspenders — OTTL filter processor in collector drops any log record where the body contains free-form prompt content, regardless of upstream config. Layer (b) protects against future Claude Code versions changing default behavior. - D29 — Per-source independent OTLP emission (no composer) Each source emits its own OTLP log records, independently, at flow termination. No composer service buffers per-flow observations or merges across sources. Cross-source pivoting (e.g. “the BPF flow doc and the Envoy access-log doc for the same connection”) happens at OSD query time via shared correlation fields, not at ingest time via a goroutine.
An earlier draft of this decision specified a composer with per-flow buffers + a timing window across sources. That design is brittle: it requires cross-source clock alignment, a buffering/eviction policy for never-arriving observations, race-free handling of out-of-order source emissions, and a custom-Envoy-build correlation key (socket_cookie) to avoid 5-tuple ambiguity that doesn't apply to clawker's deployment shape anyway. Each source already emits one complete record per flow at flow end; making OSD do the join is the simpler, less-failure-modes-per-LOC design. Aligns with the "get data in, user owns the UI" principle (D24).
- Per-source emission rule: each source emits ONE OTLP log record per flow AT flow termination (response received / TCP_CLOSE / DNS response / etc.). No partial / open-ended events.
- Shared correlation contract: every record carries (at minimum)
@timestamp,container_idwhen container-scoped,agent+projectwhen in scope (D27), and the flow's 5-tuple (src_ip,src_port,dst_ip,dst_port,l4_proto). - Query-time joins in OSD: investigators pivot by filtering on shared fields. "Show me all observations of agent X at 14:32" returns interleaved records from BPF / Envoy / CoreDNS / Claude Code. Same 5-tuple across BPF and Envoy = same flow.
- Domain attribution stays denormalized at BPF-emit time via P7 dns_cache reverse map — BPF egress event document carries
dst_hostdirectly, so the "what domain did this connect resolve to" question doesn't require a join.
- D30 — BPF ringbuf reader lives in clawker-cp as a thin goroutine, not a new package The BPF userspace reader (drain ringbuf → attribute via existing P13 label cache → emit OTLP) is small enough to live as a sibling alongside ebpf-manager —
internal/controlplane/firewall/ebpf/eventreader.goor similar, depending on import-cycle considerations. NOT a standalonenetloggerpackage — the original justification (it owns ALS server + dnstap consumer + composer) no longer exists per D29/D31/D32. Earlier draft built an entireinternal/controlplane/netlogger/package to hold three consumers + composer + label cache + OTLP emitter. With Envoy emitting OTLP natively (D31) and CoreDNS handled at the collector layer (D32), the only userspace component for the egress stream is the BPF ringbuf reader. A package boundary for one consumer is over-organisation; place it next to ebpf-manager which already owns the BPF-side primitives. Reader is a goroutine started during CP boot. Standard CP no-panic discipline:defer recover(), structuredevent=egress_reader_unavailablelog on degradation, neverpanic()/log.Fatal(). Uses the existing dockerevents-subscribed cgroup-id↔container-id label cache (P13) for attribution. Builds OTLP log records using the OTel Go SDK and ships to the in-cluster otel-collector. Subsystem failure degrades only the egress stream; CP and the enforcement plane stay up. - D31 — Envoy → otel-collector via native
envoy.access_loggers.open_telemetryEnvoy is configured with the upstream OpenTelemetry access logger extension (status: stable). It emits OTLP log records directly to the otel-collector. No ALS, no UDS gRPC server, no custom code on the netlogger side. Envoy already speaks OTLP natively for access logs via an upstream-stable extension. Standing up our own gRPC ALS server to receive stockHTTPAccessLogEntrys, then immediately re-marshaling into OTLP would be a pointless extra hop with its own connection-management failure surface. Using Envoy's native exporter eliminates an entire piece of CP-side infrastructure. Envoy YAML configuresaccess_log: { name: envoy.access_loggers.open_telemetry, typed_config: { grpc_service: { ... otel-collector endpoint ... }, body: { ... }, attributes: { ... command operators — method, path, response_code, downstream_remote, upstream_remote, response_flags, TLS SNI, timing ... } } }. Field shape is templated via Envoy's command operator language. otel-collector receives via standard OTLP gRPC receiver. Buffering / reconnect handled by Envoy's exporter, same semantics as ALS. - D32 — CoreDNS → otel-collector via
logplugin stdout × filelog receiver CoreDNS's upstreamlogplugin is configured to emit JSON-formatted per-query records to stdout. Docker captures stdout to its container log file. The otel-collector'sfilelogreceiver tails that file, parses the JSON, and ships as OTLP log records. No custom CoreDNS plugin. No dnstap. No UDS. No fstrm decoder. Nomiekg/dnsin our codepath. Writing a CoreDNS OTLP-logs plugin OR a dnstap consumer adds a custom-code surface that earns us nothing the upstreamlogplugin doesn't already provide. Thelogplugin is stock, ships with every CoreDNS, and supports a configurable format string with all the per-query fields needed (client_ip, qname, qtype, rcode, response size, timing). Container stdout × collector filelog receiver is the most idiomatic OTel ingestion pattern there is — nothing custom to ship, debug, or maintain. Corefile:log { class all format "{json}" }(or equivalent format directive producing structured JSON per query). otel-collector pipeline:filelogreceiver tailing the CoreDNS container's Docker log file →json_parseroperator →resource/transformprocessors as needed for attribute normalization → OTLP exporter to the central collector destination. Attribution: dnsbpf plugin already populates the cgroup-awaredns_cache; client_ip in the log record maps to container_id via the same dockerevents-subscribed cache used by the BPF reader (P13). DNS-side observability is now entirely outside clawker-CP-the-process — lives in CoreDNS config + collector config. - D33 — OS exporter mapping mode =
ss4oThe OpenSearch otel-collector exporter is configured withmapping.mode: ss4o. OTLP log records land as Simple-Schema-for-Observability documents: timestamp, body, severityText/Number,resource.attributes.*,attributes.*nested as the OTLP record carries them. No flattening, no body-replacement. ss4o is the exporter's stable default.flatten_attributesandbodymapare both marked unstable;bodymapadditionally requires every source to construct a Map body, fighting the OTLP idiom (source-side attributes are the natural place for event fields). Nested paths in OSD filters (attributes.dst_hostvsdst_host) are a dashboard-ergonomics concern, not a correctness one — saved-query/visual-builder presets hide the depth. ss4o also gives free integration with OpenSearch Observability plugin features that auto-recognize the schema. Configureopensearchexporterin the otel-collector pipeline withmapping: { mode: ss4o }. Verify the exporter version pinned in our collector image is the stable-mode revision (thess4omode itself is stable; the surrounding exporter component remains alpha for logs/traces — treat exporter version pin as a hard requirement, not a floating tag).
Proposals Pending Your Call
Schema-upstream proposals (P3–P10) survive the OS pivot — they describe what BPF emits, independent of where it lands. Storage-side proposals (index/mapping design, OSD panel set, otel-collector pipeline config) are not yet enumerated here — will be added as new proposals when those workstreams are walked.
Schema specifics
- P3 L4 metadata field list (always present):
ts, dst_host, dst_ip, dst_port, bytes_in, bytes_out, duration_ms. Zero values for DENIED (no traffic flowed). Alternatives: splitbytesinto a single total; includesrc_ip/src_port; includeclose_reasonfor opaque sessions; include pre-redirectdst_ip_rawvs post-redirect; carry IPv6 fields separately or coalesce. Each adds metadata cost vs forensic value. - P4 L7 record field layout per
l7_proto(Hubble'soneofequivalent, flattened):
Alternatives: closed enum of supported L7 protocols only (rejects unknown); fewer fields per HTTP-family (drop request_host since it's in dst_host); addl7_protoL4 L7 record fields Emitter httpstcp method, path, request_host, response_codeEnvoy HTTP/TLS chain httptcp method, path, request_host, response_codeEnvoy HTTP chain websockettcp method=GET, path, request_host, response_code=101— one record per upgrade; no frame-levelEnvoy HTTP (Upgrade) grpctcp method, path=/Service/Method, request_host, response_code, grpc_statusEnvoy HTTP dnsudp qname, qtype, response_ips, rcode, ttlCoreDNS dnsbpf plugin ssh, ftp, postgres, redis, mqtt, smtp, ...tcp L7 record absent (opaque) Envoy tcp_proxy unknowntcp/udp L7 record absent BPF direct response_flagsfrom Envoy as a separate metadata bag; add per-protocol decoders (Postgres queries, Redis commands) — large scope. - P5 BPF event struct (C layout):
Alternatives: include IPv6 address inline (16-byte field, doubles record size); carry socket_cookie for sock_ops correlation (requires P10); include process PID; include connect-syscall result (errno); use TLV layout for forward-compat. Trade record size vs verifier complexity vs reader simplicity.struct egress_event { __u64 ts_ns; // bpf_ktime_get_ns() __u64 cgroup_id; // attribution key __u32 domain_hash; // 0 = no DNS resolution __u32 dst_ip; // pre-redirect destination, network byte order __u16 dst_port; // host byte order __u8 action; // 0=allow 1=deny 2=bypass __u8 flags; // bit0=bypass-pass-through, bit1=IPv6 __u8 l4_proto; // SOCK_STREAM/DGRAM/RAW __u8 _pad[3]; }; - P6 Ring buffer size: 1 MB. Bigger = more headroom under burst; smaller = less wasted kernel memory. Cilium uses 16 MB perf-buf default. 1 MB is a guess based on event size (32 bytes) × ~30k events of headroom.
- P7 Domain resolution mechanism:
domain_hash→ reverse map ofdns_cache. Reader periodically scansdns_cache(max 16384 entries) and buildshash → domainreverse index. Direct-IP connects:domain="". Alternatives: BPF stores domain string directly in event (variable-length, complicates verifier); reader does livedns_cachelookup on every event (cost per event); BPF passesdns_cacheentry pointer (lifetime hazard); attach a separate name-collision map keyed by hash.
BPF emit policy & flow lifecycle
P8 and P9 superseded by D29 (composition model). BPF emits per-flow events at lifecycle hooks (connect/close); composition occurs in netlogger, not via emit-time partitioning.
- P10 Sock_ops conn-state tracking for ALL TCP flows (DENIED + ALLOWED + BYPASSED). New
sock_opsBPF program watchingBPF_SOCK_OPS_STATE_CBforTCP_CLOSE. Socket-cookie-keyed state map stamped atconnect4; on close readsbytes_sent/bytes_received/durationfrombpf_sock_ops; submits a single ringbuf event with full roundtrip data. This is the BPF-side composition trigger:TCP_CLOSEis when netlogger knows it has the BPF half of the flow and can correlate against any Envoy ALS / CoreDNS dnstap entries that arrived while the flow was live. Alternatives: skip bytes/duration for non-Envoy-observed flows (the record exists but says only “reached X” with no “how much”); userspace post-hoc read from/proc/net/*(cost, race with close); kprobe ontcp_closeinstead of sock_ops (older kernel support, less clean). UDP / connectionless protocols need a time-based flush (no TCP_CLOSE) — details deferred to implementation.
Operational concerns
- P13 Dual-indexed label cache. Single dockerevents-subscribed cache, two indexes:
cgroup_id → labelsfor BPF reader,container_ip → labelsfor Envoy + CoreDNS readers. Both invalidated by the samedie/destroyevent. Alternatives: two separate caches with independent invalidation (more code, easier to reason about ownership); per-source caches that each subscribe to dockerevents (3 subscribers); lookup-on-demand viadocker inspect(cost, no caching at all); CP-owned label registry that both readers query via in-process interface. - P14 Per-cgroup userspace rate limit on the reader. Prevents one misbehaving agent from monopolizing the ringbuf. Alternatives: per-agent global token bucket (not per-cgroup); kernel-side rate-limit in BPF (more complex, costs map space); no rate limit (rely on ringbuf size + downstream ingestion limits); rate limit on the OTLP-emit side.
- P15 Drop instrumentation (kernel ringbuf drops + reader parse failures surface to operator). Specific surface (Prom gauge, OS event, log line, or some combination) is the open question.
Alternatives: Prom counter
clawker_egress_events_dropped_total+ warn log on nonzero (Tetragon convention); log only (grep-able but no graph); OS event document withrecord_type=drop_notification(lives alongside the events it accounts for); separate counters per drop reason (kernel vs parse vs rate-limit). - P16
metrics_mapstays as break-glass debug surface forebpf-manager dumpCLI. Not exported as Prom counters in this change. Alternatives: remove the map (one less BPF resource, also one less recovery tool); keep AND export as Prom (back to graph-heavy world the user just rejected); replace with a per-event log toggle (verbose mode flag). - P17
l7_proto=unknownfor BPF-only records (DENIED at BPF, BYPASSED-with-no-rule, ALLOW-direct that bypasses Envoy). Alternatives:l7_proto=bpf(says emitter, not protocol — honest);l7_proto=""empty/null (annoying for selectors); populate via port-based heuristic (port 22 → ssh even when BPF emits it — risk: lying to the user); omit field entirely (breaks invariant thatl7_protois always populated).
Verdict source-mapping
- P18 Per-emitter verdict mapping rules:
- BPF:
action=ALLOWdirect-egress → ALLOWED;action=DENY→ DENIED (bytes=0, dur=0);action=BYPASS→ BYPASSED. - Envoy HTTP/TLS: forwarded (incl. 5xx and upstream failures) → ALLOWED; path_rules deny match (Envoy 403) → DENIED. Envoy never sees BYPASSED.
- Envoy tcp_proxy: forwarded (any close reason) → ALLOWED; deny listener / reset → DENIED.
- CoreDNS: query forwarded and answered (incl. SERVFAIL/REFUSED from upstream) → ALLOWED; query matched no allowed zone, CoreDNS returns NXDOMAIN as enforcement → DENIED.
- BPF:
Conclusions & Insights
- SIEM, not SRE The OS layer's design center is security + cost forensics across all clawker-side telemetry. Treating it as an SRE observability backend produced narrow framing (single “egress index”). The right framing is multi-source: OS receives CP logs, CP overseer events, Claude Code logs/events, network egress, future sys-exec events — all queryable with shared
agent/projectfilters. - Architecture The Envoy/Promtail attribution struggle was architectural, not configuration. Identity must be resolved in scope. BPF
cgroup_idis in scope inside BPF; CP's reader has the Docker label map in scope. Two layers, both have the info they need without cross-process gymnastics. The same architectural principle generalises across all sources: attribute at the emitter, not via post-hoc relabel. - Per-source data streams beat polymorphic mega-index A single OS index with conditional fields across CP logs + overseer events + Claude Code records + network events would carry hundreds of mostly-absent fields and risk the mapping-explosion anti-pattern. Per-source data streams under
clawker-*keep each schema clean while OSD'sclawker-*index pattern cross-queries them for SIEM workflows. - Get data in — user owns the UI The OS layer's job is to store and make queryable. Cost-attribution panels, saved searches, dashboard layouts are user-owned. The design must not bake in assumptions about what panels people will build.
- Bypass forensic black hole fixed for free with eBPF: BPF programs stay attached during bypass and still emit ACTION_BYPASS events. Today bypass = no Envoy logs = blind. Justifies the project on its own.
- Events > counters for THIS UX Investigation/filter/transform is what a search-engine UI does. Counters are for graphs and alerts. Metrics stay in Prom (D23); events go to OS.
- L4 vs L7 semantics “Response code” means different things at different layers. L4 outcome (RST/timeout/refused) for everything; HTTP response code only through MITM Envoy. Acceptable since volume isn't the signal — failure-vs-success is.
- DNS visibility constraint BPF cgroup hook fires first (sendmsg4 to UDP:53) but sees envelope only. Query content (qname, qtype, response, client_ip) is visible only in CoreDNS userspace via the existing
dnsbpfplugin. Whether CoreDNS owns DNS records exclusively (and BPF skips DNS-bound emits) is P8 / P9. - Universal proto coverage Envoy is TCP-only; eBPF at connect4 sees TCP + UDP + raw sockets + dual-stack IPv6. SSH/FTP/QUIC/reverse-shell-on-port-4444 all visible without per-proto config.
Gotchas & Risks
- BPF verifier complexity new ringbuf map + emit helper means a verifier re-run. Keep event struct flat (no pointers, fixed size). Reserve/submit pattern well-trodden — should pass on first try.
- Event flood from misbehaving agent tight-loop connect attempts can fill ringbuf. Mitigations are inter-dependent proposals: ringbuf size (P6), drop instrumentation (P15), per-cgroup userspace rate limit (P14). Multi-layer mitigation is the principle; specific mechanism is a proposal.
- Drop instrumentation MUST exist day 1 Silent drops = security false confidence. Ringbuf drop counter (kernel) + reader parse-fail counter must be surfaced — specific surface is P15.
- CP no-panic invariant applies to reader. Goroutine in CP (PID 1).
defer recover(). On unrecoverable error, logevent=egress_monitor_unavailableand degrade; do not crash CP. eBPF stays attached. - cgroup_id → agent lookup cost Docker inspect per event = too expensive. Cache
cgroup_id→ {agent, project} with TTL or invalidation on docker-die event. Use existing dockerevents bus. - cgroup_id reuse kernel may reuse cgroup_id after container dies. Cache must invalidate on die-event or a new container could inherit old labels.
- Domain resolution from BPF events
dns_cacheis keyed by IP, not domain_hash. Some reverse-lookup mechanism is required to surface human-readable domain in records. Mechanism is P7. Direct-IP connects produce no domain — event must not be dropped on that account. - Pre-existing:
metric_incsilent drop on fullmetrics_mapUsesBPF_NOEXIST; new tuples after 16384-entry max are silently dropped. Worth filing as separate issue (switch toBPF_MAP_TYPE_LRU_PERCPU_HASH, or scraper deletes dead-cgroup entries). Not part of this change. - Docs followup:
docs/firewall.mdxarchitecture diagram is chronologically imprecise. Lines ~19–31 showAgent — DNS query —> CoreDNSwith eBPF shown as a passive map store. In reality the BPFsendmsg4hook fires in-syscall before the DNS packet reaches CoreDNS, and BPF is the chronological first-touch on every egress. Diagram conflates “who decides what” with “what fires first”. Must be updated alongside this feature. Not blocking implementation; capture in same docs PR.
Open Items & Questions
- Envoy OTLP access logger field coverage (D31). Verify that
envoy.access_loggers.open_telemetryexposes every field we need via command operators: downstream_remote_address, upstream_remote_address, response_flags, TLS SNI/cipher, timing breakdown (start_time, duration, time_to_first_upstream_byte, …). If any field is unreachable, that's a real gap to plan for. - CoreDNS
logplugin format directive (D32). Confirm the format string supports emitting all needed per-query fields as JSON: client_ip, client_port, qname, qtype, rcode, response_size, query_time, response_time. If the format directive can't emit JSON natively, fall-back is a regex-parser operator in the filelog receiver pipeline. - UDP / connectionless flow lifecycle for BPF egress events. P10 covers TCP via sock_ops
TCP_CLOSE. UDP has no close signal — need a time-based flush window or "first sendmsg = flow start, no further sendmsg or recvmsg within N ms = flow end" heuristic. Specifics deferred to implementation. Less load-bearing than under the composer model since each source emits independently and OSD can pivot via 5-tuple. - Event-struct field set: should pre-redirect
dst_ipbe captured separately from post-redirect (forensic value: what the agent “actually tried” vs what Envoy received)? Folds into P3 / P5. - Ring buffer type:
BPF_MAP_TYPE_RINGBUFsingle global vs per-CPU perf buffer. Ringbuf needs ≥ 5.8; perf-buf older-kernel-friendly but single-reader cost vs lock-free parallel. Folds into P5 / P6. - OS index/mapping design: mapping mode locked as D33 (
ss4o). Index-naming pattern and dynamic-mapping policy NOT locked. - Cross-source attribution-placement. Within ss4o, agent/project belong in
resource.attributes(when emitting process IS the agent, e.g. Claude Code in agent container) orattributes(when CP emits records ABOUT an agent — CP-the-process is not itself an agent). Cross-source filter ergonomics push toward a single consistent path. - Routing-key naming — whether the ss4o exporter routes via a clawker-namespaced attribute or an existing OTel semconv attribute is open.
- Whether CP zerolog and overseer events are two data streams or one (both emitted by clawker-cp process; distinct in inventory on shape grounds).
Unknowns
- Existing “Egress Traffic” panel layout (IDs 54–58) — what carries over to OSD, what gets re-cut.
- Kernel-floor reality — clawker's actual support matrix. Ringbuf requires ≥ 5.8. Verify.
- Whether
dockereventsbus is already a subscriber-friendly surface for the cgroup cache invalidator. - OSD Prometheus datasource plugin maturity for the ~30 existing PromQL panel shapes (rate / sum / counter increments / unwrap). Verify during dashboard port.
Next Steps
- OS index / mapping design for the egress stream. Index naming, field types, retention. Will surface as a new proposal block.
- otel-collector pipeline config rewrite. Drop
otlphttp/lokiandotlp/jaegerexporters; add OpenSearch exporter for events/logs. Keepprometheusexporter and metrics pipeline (D23). Preserve existingresource/agent+resource/cp+transform/metricsprocessors (provenance-stamping invariants are storage-independent). - Envoy and CoreDNS access-log ingest path. How those access logs reach
otel-collector— new proposals when walked. - OSD dashboard migration plan. Mechanical port of the existing ~30 panels. Includes OSD Prometheus datasource verification for the PromQL panels (if gaps, adapt panels — not re-add Grafana per D23).
- After design lock: produce a Phase-2 implementation plan as a separate exercise. Phasing comes after design is locked, not during.
- Separately filed: pre-existing
metrics_mapsilent-drop bug;docs/firewall.mdxchronology fix (D15). Both noted, neither part of this design.