_wire_sanitize
_wire_sanitize
¶
Producer-side wire-format invariant — printable ASCII, length-capped.
WIRE_SPEC(safe-string): keep in sync with¶
terok_clearance/src/terok_clearance/wire/sanitize.py — same rule,¶
same character class, same length cap. grep WIRE_SPEC finds¶
every copy across the producer/consumer boundary; clearance owns¶
the canonical version (it's the wire-format consumer).¶
The threat model: container processes can craft DNS names, hostnames, or annotation bytes that flow through the shield's watchers and the NFLOG reader straight onto the hub socket. A consumer downstream (notification daemon, terminal TUI, audit listener) sees those bytes verbatim unless someone trims them along the way.
Sanitising at every emit point — here, in _hub_events, in the
reader resource — is belt-and-braces: clearance also applies the
same rule on the receive side, so a regression on either side keeps
the contract. Producer-side sanitisation specifically protects the
container-out path that's the primary attack surface; consumer-side
catches every other event source.
Rule (single, simple):
- Printable ASCII (
[\x20, \x7E]) passes through unchanged. - Anything else — control bytes, non-ASCII, RTLO/LRO bidi overrides, the lot — collapses to a single space, position-preserving.
- Strings longer than
max_lenare truncated with a trailing...ASCII marker.
Stdlib-only by design — no external imports — so this module sits in
the foundation tach layer alongside _hub_events.
DEFAULT_MAX_LEN = 256
module-attribute
¶
sanitize(value, *, max_len=DEFAULT_MAX_LEN)
¶
Coerce value to printable ASCII, capped at max_len characters.
When max_len is shorter than the truncation marker, the marker is
clipped so the post-condition len(result) <= max_len always
holds.