Skip to content

install

install

Install + uninstall the OCI hook that spawns the supervisor.

Single-root layout: scripts, ballast, and the JSON descriptor all live under state_root() (which honours the operator's paths.root config). containers.conf is patched to list state_root() / "hooks" in hooks_dir so podman scans the canonical terok-owned directory rather than the default ~/.config/containers/oci/hooks.d/.

Files written:

  • <state_root>/hooks/supervisor_hook.py
  • <state_root>/hooks/_supervisor_state.py
  • <state_root>/hooks/terok-sandbox-supervisor-createRuntime.json
  • <state_root>/hooks/terok-sandbox-supervisor-poststop.json (one OCI hook descriptor per stage — podman/crun reuse the same hook.args for every stage listed in a single descriptor, so each stage gets its own JSON; both match on the terok.sandbox.sidecar annotation)
  • <state_root>/supervisor_wrapper.py (templated — embeds the resolved terok-sandbox argv)

The supervisor flow is annotation-driven from here on: the launch path emits --annotation terok.sandbox.sidecar=<abspath> and the hook reads the sidecar at that path. No $XDG_* discovery, no stamp files, no parallel root resolution.

install_supervisor_hooks(*, hooks_dir=None)

Lay down hook scripts, wrapper, and the OCI descriptor.

hooks_dir — override for tests; defaults to state_root() / "hooks", where the role scripts already live. Scripts + ballast + descriptor share one terok-owned directory so a teardown is a clean rm -rf. containers.conf is patched to register that path.

Idempotent — every file write overwrites silently, and the descriptor JSON gets re-rendered each time so a moved install location is picked up on the next terok-sandbox setup.

Source code in src/terok_sandbox/supervisor/install.py
def install_supervisor_hooks(*, hooks_dir: Path | None = None) -> None:
    """Lay down hook scripts, wrapper, and the OCI descriptor.

    *hooks_dir* — override for tests; defaults to
    ``state_root() / "hooks"``, where the role scripts already live.
    Scripts + ballast + descriptor share one terok-owned directory so
    a teardown is a clean ``rm -rf``.  ``containers.conf`` is patched
    to register that path.

    Idempotent — every file write overwrites silently, and the
    descriptor JSON gets re-rendered each time so a moved install
    location is picked up on the next ``terok-sandbox setup``.
    """
    install_root = state_root()
    hooks_install_dir = install_root / "hooks"
    hooks_install_dir.mkdir(parents=True, exist_ok=True)

    pkg_resources = Path(__file__).resolve().parent.parent / "resources"
    pkg_hooks = pkg_resources / "hooks"

    _copy_executable(pkg_hooks / _HOOK_SCRIPT_NAME, hooks_install_dir / _HOOK_SCRIPT_NAME)
    _copy_executable(pkg_hooks / _BALLAST_NAME, hooks_install_dir / _BALLAST_NAME)

    sandbox_argv = _resolve_sandbox_argv()
    _render_wrapper(
        src=pkg_resources / _WRAPPER_NAME,
        dst=install_root / _WRAPPER_NAME,
        sandbox_argv=sandbox_argv,
    )

    descriptor_dir = hooks_dir or hooks_install_dir
    descriptor_dir.mkdir(parents=True, exist_ok=True)
    # One JSON descriptor per stage — podman/crun reuse the same
    # ``hook.args`` for every stage in a single descriptor's ``stages``
    # list (no per-stage argv injection), so we'd lose the stage signal
    # otherwise.  The hook script self-dispatches on ``argv[1]``.
    for stage in _HOOK_STAGES:
        (descriptor_dir / _descriptor_name(stage)).write_text(
            _render_hook_descriptor(hooks_install_dir / _HOOK_SCRIPT_NAME, stage=stage),
            encoding="utf-8",
        )
    ensure_user_hooks_dir_configured(descriptor_dir)

uninstall_supervisor_hooks(*, hooks_dir=None)

Remove every file install_supervisor_hooks writes.

Idempotent — missing files are tolerated. Does not touch per-container state (sidecar/, logs/, pids/ under the state root) — those are sweep-able with a separate operator command if needed.

Source code in src/terok_sandbox/supervisor/install.py
def uninstall_supervisor_hooks(*, hooks_dir: Path | None = None) -> None:
    """Remove every file [`install_supervisor_hooks`][terok_sandbox.supervisor.install.install_supervisor_hooks] writes.

    Idempotent — missing files are tolerated.  Does **not** touch
    per-container state (``sidecar/``, ``logs/``, ``pids/`` under the
    state root) — those are sweep-able with a separate operator
    command if needed.
    """
    install_root = state_root()
    paths = [
        Path("hooks") / _HOOK_SCRIPT_NAME,
        Path("hooks") / _BALLAST_NAME,
        Path(_WRAPPER_NAME),
    ]
    paths.extend(Path("hooks") / _descriptor_name(stage) for stage in _HOOK_STAGES)
    for relative in paths:
        (install_root / relative).unlink(missing_ok=True)
    if hooks_dir is not None:
        for stage in _HOOK_STAGES:
            (hooks_dir / _descriptor_name(stage)).unlink(missing_ok=True)

kill_all_supervisors()

SIGKILL every live host-side supervisor process; return one row per PID file.

Iterates <state_root>/pids/supervisor-*.pid. For each file: read the PID, SIGKILL if alive, then unlink the stale file. Each returned row is (container_id, error_or_None)None means the process is no longer there, whether we killed it or it had already exited.

Designed for the panic path: the OCI poststop reap does a graceful SIGTERM → poll → SIGKILL dance for a normal container stop; panic skips straight to SIGKILL because the whole point is to deny the supervisor any more cycles to answer socket calls from a misbehaving container.

PID-recycle check is intentional but tight: the file name carries the container ID, so a stale PID that's been recycled into an unrelated process can still be matched by reading /proc/<pid>/cmdline for the wrapper path before signalling.

Source code in src/terok_sandbox/supervisor/install.py
def kill_all_supervisors() -> list[tuple[str, str | None]]:
    """SIGKILL every live host-side supervisor process; return one row per PID file.

    Iterates ``<state_root>/pids/supervisor-*.pid``.  For each file:
    read the PID, ``SIGKILL`` if alive, then unlink the stale file.
    Each returned row is ``(container_id, error_or_None)`` — ``None``
    means the process is no longer there, whether we killed it or it
    had already exited.

    Designed for the panic path: the OCI ``poststop`` reap does a
    graceful ``SIGTERM`` → poll → ``SIGKILL`` dance for a normal
    container stop; panic skips straight to ``SIGKILL`` because the
    whole point is to deny the supervisor any more cycles to answer
    socket calls from a misbehaving container.

    PID-recycle check is intentional but tight: the file name carries
    the container ID, so a stale PID that's been recycled into an
    unrelated process can still be matched by reading
    ``/proc/<pid>/cmdline`` for the wrapper path before signalling.
    """
    results: list[tuple[str, str | None]] = []
    pids_dir = state_root() / _PIDS_DIR_NAME
    if not pids_dir.is_dir():
        return results
    wrapper_path = str(state_root() / _WRAPPER_NAME)
    for pid_file in sorted(pids_dir.glob(_PID_GLOB)):
        container_id = pid_file.stem.removeprefix("supervisor-")
        results.append((container_id, _kill_one_supervisor(pid_file, wrapper_path, container_id)))
    return results